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Executive Summary 


This document explores and describes the state-of-the-art cybersecurity solutions and technologies 
for Electrical Power and Energy Systems (EPES). Moreover, it identifies how the corresponding 
solutions can be efficiently evaluated by utilizing specific Key Performance Indicators (KPIs). In 
particular, for this study, the NIST Framework for Improving Critical Infrastructure Cybersecurity was 
adopted, utilizing the five defined functions, namely a) Identify, b) Protect, c) Detect, d) Respond and 
e) Recover. For each of the aforementioned functions, the respective solutions and evaluation 
processes were analyzed. Finally, based on this study, specific recommendations are extracted, thus 
providing useful directions regarding the tools and methods that will be developed during the SDN- 
microSENSE project. More detailed, the recommendations extracted by this deliverable are organized 
in 15 aspects related to the project, namely: 


1) asset management, 

2) business environment, 

3) governance and risk management, 
4) risk assessment, 

5) risk management strategy, 

6) supply chain risk management, 

7) identify and control management, 
8) awareness and training, 

9) data security, 

10) Information protection processes, 
11) maintenance, 

12) protective technology, 

13) intrusion detection and prevention processes, 
14) anomaly detection, and 

15) incident response. 
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1. Introduction 


1.1 Purpose of this document 

This document presents state of the art cybersecurity solutions, technologies and best practices for 
the energy sector. In addition, the appropriate evaluation processes and KPIs related to these solutions 
are identified. To conclude, this deliverable constitutes a benchmark regarding the cybersecurity 
technologies, tools and solutions that can be adopted during the SDN-microSENSE project. 


1.2 Methodology 


The deliverable is based on the NIST - Framework for Improving Critical Infrastructure Cybersecurity 
[1], which is “a voluntary guidance based on existing practices, guidelines and standards for 
organizations to handle and mitigate cybersecurity issues efficiently. It was designed to foster risk and 
cybersecurity management communications between both internal and external organizational 
stakeholders.” As illustrated in Figure 1, the framework consists of five functions that are analyzed 
below. 


FRAMEWORK 


Figure 1: The Five Functions of the NIST framework. [1] 


As presented in the Framework, these functions are the highest included level of abstraction, 
representing the five primary pillars for a successful and holistic cybersecurity program. These are 
further analyzed as: 

1. Identify: The Identify function is responsible for developing an organizational understanding 
related to handling cybersecurity risks in a critical infrastructure associated with assets, people and 
data. 

2. Protect: The Protect function includes the appropriate actions related to the successful execution 
of the services taking place in a critical infrastructure. Its actions are characterized by the capability 
to mitigate the impact of a possible security event. 

3. Detect: The Detect function comprises the appropriate measures for detecting timely possible 
security events. 

4. Respond: Accordingly, the Respond function is responsible for performing the necessary action 
against the security events identified by the previous function, thus mitigating their potential 
impact. 

5. Recover: Finally, the Recover function undertakes to recover the normal functionality of those 
services affected by a security event. Moreover, it identifies plans and activities related to the 
resilience and restoration services of the critical infrastructure in case of a cyberattack. 


1.3 Structure of this document 


This document is divided into the following sections. 
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4. 


5. 


Introduction: This section introduces the reader to the document by explaining its purpose, the 
methodology adopted, its structure as well as the relation of the specific deliverable with the other 
tasks and deliverables of the SDN-microSENSE project. 

Background: This section includes a short introductory background related to the energy sector as 
a critical infrastructure. In addition, it provides an inventory of available guidelines and standards 
for smart grid cybersecurity. 

State of the art cybersecurity solutions and technologies: This section presents state of the art 
cybersecurity solutions and technologies across the five functions of the NIST Framework for 
Improving Critical Infrastructure Cybersecurity. Also, it describes how the corresponding solutions 
can be evaluated by identifying specific KPIs. 

Recommendations: This includes specific recommendations extracted by the study conducted in 
Section 3. These recommendations can be used by the other tasks and deliverables of the project. 
Conclusions: This section gives the concluding remarks of this deliverable. 


1.4 Relation to other Work Packages 


Figure 2 illustrates the relation of the specific deliverable with the other tasks and deliverables. In 
particular, D2.1 provides feedback to each technical WP, namely WP2, WP3, WP4 and WP5, by 
identifying the corresponding state of the art solutions. However, based on the Grant Agreement, D2.1 
gives input to Task 2.3 by contributing to the definition of the SDN-microSENSE platform and its 
technical specifications. Moreover, it guides T4.1, T4.2 and T4.3, identifying mainly how the SDN 
technology can be used to mitigate possible cyberattacks. Finally, it guides also T5.1 and T5.2 regarding 
the various SIEM and IDS systems, respectively. 
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2. Background 


This section provides a brief discussion on key background concepts relevant to the structure of this 
deliverable, but also its focus area and relevant guidelines and recommendations. 


2.1 Critical Energy/Electric Infrastructure 


Electricity is a subsector of the energy sector, defined as “Infrastructures and facilities for generation 
and transmission of electricity in respect of supply electricity” [2], and constitutes one of the identified 
European Critical Infrastructures. According to the definition used within the European Union, a critical 
infrastructure is “an asset, system or part thereof located in Member States which is essential for the 
maintenance of vital societal functions, health, safety, security, economic or social well-being of 
people, and the disruption or destruction of which would have a significant impact in a Member State 
as a result of the failure to maintain those functions". [2] 


The electricity sector is regularly segmented in three major components, namely: (i) generation 
systems, (ii) high voltage transmission grid, and (iii) distribution systems, as presented in Figure 3. 


Color Key: Substation 


Red: Generation Step Down Subtransmission 
Transformer Customer 


26 kV and 69 kV 


Blue: Transmission 
Green: Distribution Transmission lines 


Black: Customer 


765, 500, 345, 230, and 138 kV 
- ^ 


Generating Station Primary Customer 


all 13 KV and 4 kV 


| Transmission Customer [à al Secondary Customer 
Generating j 120 Vand 240 V 
Step Up 138 kV or 230 kV onc 
Transformer 


Figure 3: Simplified electricity grid illustration. [3] 


In terms of cyber security, the electricity sector presents unique peculiarities primarily due to the 
following parameters, as described in [4]: 


1. Real-time requirements - some energy systems need to react so fast that standard security 
measures such as authentication of acommand or verification of a digital signature can simply not 
be introduced due to the delay these measures impose. 

2. Cascading effects - electricity grids and gas pipelines are strongly interconnected across Europe 
and well beyond the EU. An outage in one country might trigger blackouts or shortages of supply 
in other areas and countries. 

3. Combined legacy systems with new technologies - many elements of the energy system were 
designed and built well before cybersecurity considerations came into play. This legacy now needs 
to interact with the most recent state-of-the-art equipment for automation and control, such as 
smart meters or connected appliances, and devices from the Internet of Things without being 
exposed to cyber-threats. 


The European Commission’s recommendation on cybersecurity in the energy sector [5] addresses 
these three areas as: 


1) Real-time requirements: 
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2) 


a. 


3) 


apply the most recent security standards for new installations wherever adequate and 
consider complementary physical security measures where the installed base of old 
installations cannot be sufficiently protected by cybersecurity mechanisms; 

implement international standards on cybersecurity and adequate specific technical 
standards for secure real-time communication as soon as respective products become 
commercially available; 

consider real-time constraints in the overall security concept for assets, especially in asset 
classification; 

consider privately owned networks for tele-protection schemes to ensure the quality of 
service level required for real-time constraints; when using public communication networks, 
operators should consider ensuring specific bandwidth allocation, latency requirements and 
communication security measures; 

split the overall system into logical zones and within each zone, define time and process 
constraints in order to enable the application of suitable cybersecurity measures or to 
consider alternative protection methods. 


Cascading effects 


ensure that new devices, including Internet of Things devices, have and will maintain a level 
of cybersecurity appropriate to a site's criticality; 

adequately consider cyber-physical effects when establishing and periodically reviewing 
business continuity plans; 

establish design criteria and an architecture for a resilient grid, which could be achieved by: 

i. putting in place in-depth defense measures per site, tailored to a site's criticality; 

ii. identifying critical nodes, both in terms of power production capacity and customer 
impact; critical functions of a grid should be designed to mitigate risk that can cause 
cascading effects by considering redundancy, resilience to phase oscillations and 
protections against cascaded load cut-off; 

iii. collaborating with other relevant operators and with technology suppliers to prevent 
cascading effects by applying appropriate measures and services; 

iv. designing and building communication and control networks with a view to confining 
the effects of any physical and logical failures to limited parts of the networks and to 
ensuring adequate and swift mitigation measures. 


Combined legacy systems with new technologies 
a. 


analyze the risks of connecting legacy and Internet of Things concepts and be aware about 
internal and external interfaces and their vulnerabilities; 

take suitable measures against malicious attacks originating from large numbers of 
maliciously controlled consumer devices or applications; 

establish an automated monitoring and analysis capability for security-related events in 
legacy and Internet of Things environments, such as unsuccessful attempts to log-in, door 
alarms for cabinet opening or other events. 

conduct on a regular basis specific cybersecurity risk analysis on all legacy installations, 
especially when connecting old and new technologies; since the legacy installations often 
represent a very large number of assets, risk analysis might be done by asset classes; 

update software and hardware of legacy and Internet of Things systems to the most recent 
version whenever adequate; in so doing, energy network operators should consider 
complementary measures such as system segregation or adding external security barriers 
where patching or updating would be adequate but is not possible, for instance unsupported 
products; 
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f. formulate tenders with cybersecurity in mind, that is to say demand information about 
security features, demand compliance with existing cybersecurity standards, ensure 
continuous alerting, patching and mitigation proposals if vulnerabilities are discovered, and 
clarify vendor liability in the event of cyber-attacks or incidents; 

g. collaborate with technology suppliers to replace legacy systems whenever beneficial for 
security reasons but take into account critical system functionalities. 


These were further elaborated and detailed in an accompanying document [6], which provided 
additional information on their implementation. Furthermore, the EECSP-Expert Group analyzed 
whether the energy sector is sufficiently covered by existing legislation or if there is a need for more 
action to achieve an effective cyber security [7]. This was approached towards meeting two high-level 
objectives, namely: 


1) Secure energy systems that are providing essential services to the European society. 
2) Protect the data in the energy systems and the privacy of the European citizen. 


and highlighted ten areas when further action was required: 


1) Identification of operators of essential services for the energy sector at EU level. 
2) Risk analysis and treatment. 

3) Framework of rules for a regional cooperation. 

4) EU framework for vulnerabilities disclosure for the energy sector. 

5) Define and implement cyber response framework and coordination. 

6) Implement and strengthen the regional cooperation for emergency handling. 

7) Establish a European cyber security maturity framework for energy. 

8) Establish a cPPP for supply chain integrity. 

9) Foster European and international collaboration 

10) Capacity and competence build-up. 


2.2 Microgrids 


Microgrids are defined as low voltage networks ranging from a few hundred kilowatts to a couple of 
megawatts. They include distributed generation sources, local storage devices and controllable loads, 
and although they are connected to the distribution network, they support islanded operation if 
necessary, allowing restoration of the connection once faults in the distribution network have been 
resolved. These topologies create entirely new and more complex asset classes, such as hardware, 
firmware, software, communications systems and storage capabilities. Understanding communication 
and data flows is important to ensure reliability and resilience. The European Union’s Expert Group on 
the Security and Resilience of Communications Networks and Information Systems for Smart Grids [8] 
has identified and categorized relevant assets that should be protected against cyber threats. That 
involves all critical energy assets within the Transmission, Distribution and Generation space which 
can: 


1. Cause an International, cross border, national or regional power outage or damage to 
infrastructure; 

2. Cause a significant impact to Energy market participants; 

3. Cause a significant impact on Operations and Maintenance of the energy grid; 

4. Pose a significant risk to Personal Data of citizens (Privacy); 

5. Cause significant safety issues for people. 


Furthermore, there are certain issues that are specific to their communication systems. For instance, 
network management can be complex, time consuming, and the communication systems are built on 


© SDN-microSENSE consortium Page | 14 
Public document 


(x) SDN-uSense 
D2.1 


Version 1.0 


different vendor specific devices and protocols. Moreover, they face several cyber security challenges 
[9] [10]. The SDN paradigm with its capability to separate the control plane from the data plane can 
provide flexibility in controlling, managing, and dynamically reconfiguring such systems to meet their 
specific quality of service requirements [11]. Thus, it is vital to develop a secure, resilient and efficient 
SDN-based system [12] [13]. Security solutions such as IDS, firewalls, and encryption methods play a 
significant role in securing the conventional networks. However, these mechanisms cannot be 
generically deployed as they have many limitations for environments with strict application 
requirements such as latency and bandwidth [14]. In addition, cyberattacks are becoming more 
sophisticated and complex; they are able to target at the same time multiple layers of acommunication 
system [15]. Furthermore, due to the required interoperability of several logical domains, their security 
requirements differ from one domain to another. For example, the transmission domain requires 
delay-efficient key management, whereas the market domain requires large-scale key management 
[9]. Therefore, it is desirable to combine several security mechanisms rather than apply a simple 
security approach or deploy a specific security technology to prevent/mitigate cyberattacks. 


2.3 Guidelines and Standards 


Leszczyna et al. in [16] surveyed and presented the state-of-the-art standards and protocols 
implemented for the smart grid to ensure the uneventful information processing. The author identified 
multiple initiatives related to smart grid standardization, namely: 


e CEN-CENELEC-ETSI Smart Grid Coordination Group 

e European Commission Smart Grid Mandate Standardization M/490 

e German Standardization Roadmap E-Energy / Smart Grid 

e IEC Strategic Group 3 Smart Grid 

e IEEE 2030 

e ITU-T Smart Grid Focus Group, 

e Japanese Industrial Standards Committee (JISC) Roadmap to International Standardization for 
Smart Grid 

e  OpenSG SG Security Working Group 

e Smart Grid Interoperability Panel 

e The State Grid Corporation of China (SGCC) Framework 


All the identified relevant standards are presented in Table 1. 


Table 1: Standards for protection in smart grids [16]. 


Standard Scope Applicability Range Pub. 
1. NISTIR 7628 Smart Grid Cybersecurity All components US 2014 
2. NERC CIP Bulk electric system All components US 2013 
cybersecurity 
3. IEEE C37.240 Cybersecurity of Substations worldwide 2014 
communication systems 
4. IEC 62443 IACS cybersecurity IACS (SCADA) worldwide 2009 
5. Cybersecurity Cybersecurity requirements | IACS (SCADA) US 2008 
Procurement for procurement 
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6. AMI System Cybersecurity requirements | AMI US 2008 
Security for procurement 
Requirements 
7. Privacy and Security and privacy AMI Netherlands | 2010 
Security of AMI | requirements 
8. DHS Catalog IACS cybersecurity IACS (SCADA) US 2009 
9. ISO/IEC 27019 Power systems’ IACS IACS (SCADA) worldwide 2013 
security 
10. | IEC 62351 Security of communication All components worldwide 2007 
protocols 
11. | IEEE 1686 Cybersecurity IEDs worldwide 2007 
12. | ISO 15118 Vehicle-grid communication | PEV and relevant worldwide 2014 
communication 
infrastructure 
13. | VGB S-175 Cybersecurity requirements | Power plants Germany 2014 
for power plants 
General application standards and guidelines that specify cybersecurity requirements. 
14. | ISO/IEC 27001 | IS management General Worldwide 2013 
15. | GB/T 22239 IS management General technical China 2008 
16. | GB/T 20279 Security requirements for General technical | China 2015 
firewalls and similar devices 
17. | ISO/IEC 19790 | Security requirements for Technical Worldwide 2012 
cryptographic modules 


Furthermore, the International Electrotechnical Commission published and maintains a Smart Grid 
Standards Map [17]. The map allows the identification of all relevant standards for any part of a Smart 
grid, also containing security as a cross cutting function, as presented in Figure 4. 
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Figure 4: Smart Grids Standards Map. [17] 


3. State of the Art Cyber security solutions and technologies 


This section presents state of the art cyber security solutions and technologies. For each of the five 
functions of the NIST framework we discuss background on the function itself, and its relevance and 
applicability to the context of the energy sector at large, but also specifically to microgrids. 
Additionally, we present Key Performance Indicators for the evaluation of cybersecurity solutions 
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relevant to the function, and present solutions that have been identified across all technology 
readiness levels. 


3.1 Identify 


3.1.1 Background on the Function 


The identify function is the groundwork for all the cybersecurity solutions and functions to follow. In 
order to be successful in the implementation of a holistic cybersecurity approach, the enterprises need 
to identify all their assets, including hard security assets such as servers and networks, soft assets such 
as software, data and people but also governance, risk management approach, and business 
environment. The identification process involves a consideration of what type of information is likely 
to be exchanged by an organization. However, identifying the relevant data or information is the first 
step. Another thing to be identified is whether the risk of a cyber-attack is high or low, and whether 
the consequences of a breach are likely to be minor, moderate, or severe. The NIST framework 
identifies the following categories of cybersecurity solutions which are relevant to the Identify 
function: 


e Asset Management: The data, personnel, devices, systems, and facilities that enable the 
organization to achieve business purposes are identified and managed consistent with their 
relative importance to organizational objectives and the organization’s risk strategy. 

e Business Environment: The organization’s mission, objectives, stakeholders, and activities are 
understood and prioritized; this information is used to inform cybersecurity roles, responsibilities, 
and risk management decisions. 

e Governance & Risk Management: The policies, procedures, and processes to manage and monitor 
the organization’s regulatory, legal, risk, environmental, and operational requirements are 
understood and inform the management of cybersecurity risk. 

e Risk Assessment: The organization understands the cybersecurity risk to organizational operations 
(including mission, functions, image, or reputation), organizational assets, and individuals. 

e Risk Management Strategy: The organization’s priorities, constraints, risk tolerances, and 
assumptions are established and used to support operational risk decisions. 

e Supply Chain Risk Management: The organization’s priorities, constraints, risk tolerances, and 
assumptions are established and used to support risk decisions associated with managing supply 
chain risk. The organization has established and implemented the processes to identify, assess and 
manage supply chain risks. 


3.1.2 Theoretical background 


This Function guides the owner/operator in the development of the foundation for cybersecurity 
management, and in the understanding of cyber risk to systems, assets, data, and capabilities based 
on the following processes [18]: 


1. Physical devices and systems within the organization are inventoried. 

2. Software platforms and applications within the organization are inventoried. 
3. Organizational communication and data flows are mapped. 

4. External information systems are catalogued. 


5. Resources (e.g., hardware, devices, data, time, personnel, and software) are prioritized based on 
their classification, criticality, and business value. 
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6. Cybersecurity roles and responsibilities for the entire workforce and third-party stakeholders (e.g., 
suppliers, customers, partners) are established. 


Technology is transforming the asset management industry at a speed and scale never seen before. 
The global regulatory environment for cyber security and privacy is becoming more complex and 
fragmented. This combined with the regular cases of high-profile power-grid breaches being reported, 
creates an issue that requires attention. In the context of the business environment, common business 
objectives for the grid are identified. These business objectives, which also account for regulatory and 
cybersecurity requirements, provide a useful context for identifying and managing applicable 
cybersecurity risks and mitigations. Four business objectives for power systems stakeholders are 
identified in literature [18]: 


1. Maintain Safety; 

2. Maintain Power System Reliability; 

3. Maintain Power System Resilience and 
4. Support Grid Modernization. 


From a business perspective, cybersecurity attacks may affect every entity from small businesses to 
multinational companies. The motivation for outsider cybersecurity attacks can vary, from activist 
groups, to criminals, to state-affiliated organizations. As businesses rely more and more on electronic 
transmission of data, it becomes imperative to recognize the impact of the virtual aspects of the supply 
chain on the business and the increased potential for data breaches. Especially for the IT industry, 
which is affected by a “gray market” of unauthorized dealers, fraudulent brokers, and defective parts, 
this use of, untrusted sources are increasingly becoming a cybersecurity issue. This is also a common 
practice in businesses in all parts of the supply chain (physical and virtual) as stakeholders submit to 
the pressures of cost and schedule [19]. Developing and implementing a Cybersecurity Risk 
Management mechanism facilitates better-informed decision making throughout the organization, 
which then leads to more effective resource allocation, operational efficiencies, and to the ability to 
mitigate and respond rapidly to cybersecurity risk. By implementing a cybersecurity risk management 
framework, infrastructure can be better secured. The policies, procedures, and processes to manage 
and monitor the organization’s regulatory, legal, risk, environmental, and operational requirements 
will be better understood, and the management will be informed of cybersecurity risks. According to 
[1], in order to implement a risk management framework, an organization needs to: 


1. Establish and communicate the organizational cybersecurity policy. 


2. Coordinate and align cybersecurity roles and responsibilities with internal roles and external 
partners. 


3. Manage the legal and regulatory requirements regarding cybersecurity, including privacy and civil 
liberties obligations. 


4. Ensure that governance and risk management processes to address cybersecurity risks. 


The main objective of risk assessment is to identify threats and cyber security vulnerabilities and 
determine their impact. The risk assessment results in terms of safety and security controls should be 
used in the determination of an intelligent network selection. A risk-based approach can be 
implemented in order to address the security aspects of the smart grid. By implementing risk 
assessment strategies, an organization aims to understand the cybersecurity risk to organizational 
operations (including mission, functions, image, or reputation), organizational assets, and individuals. 
According to NIST [1], in a risk assessment process: 
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1. Asset vulnerabilities are identified and documented 

2. Cyber threat intelligence is received from information sharing forums and sources 
3. Threats, both internal and external, are identified and documented 

4. Potential business impacts and likelihoods are identified 

5. Threats, vulnerabilities, likelihoods, and impacts are used to determine risk 

6. Risk responses are identified and prioritized 


Reformation of the electrical system, along with two-way movement of electricity and information, IT 
and telecommunications infrastructure has become a severe infrastructure in the energy sector. Global 
cyber security strategy for the grid is to alleviate these conditions and infrastructure development as 
well as domain-specific solutions for the different parts of a common strategy to ensure effectiveness. 
The cybersecurity of the grid can be considered as a supply chain problem as well. Each phase of the 
supply chain involves the risks that counterfeit products or compromised component may be inserted 
in the grid. These Supply chain risks are constantly growing, since the new technology is globally 
sourced, companies are not always familiar with the security and reliability needs of critical 
infrastructure systems with 30-year life spans. Sophisticated supply chains prioritize risk in order to 
allocate the most stringent scrutiny and security to the highest priority components. This kind of risk 
framework is also needed for the smart grid supply chain. As the number of loT devices connected to 
the grid is increasing, it is not feasible to provide the same level of physical and cyber scrutiny and 
security to all of them. The framework should enable a tiered system of risk-based security measures, 
which provide the full measure of protection where there are system-wide, extended impacts [20]. 


In general, a risk management project will include the phases of risk identification, risk analysis and 
assessment, responding to risks, monitoring and evaluation [21] - [22]. The organization's priorities, 
constraints, risk tolerances, and assumptions should be established and used to support risk decisions 
associated with managing supply chain risk. The processes to identify, assess and manage supply chain 
risks can be summed up as follows [1]: 


1. Cyber supply chain risk management processes are identified, established, assessed, managed, and 
agreed to by organizational stakeholders 


2. Suppliers and third-party partners of information systems, components, and services are 
identified, prioritized, and assessed using a cyber supply chain risk assessment process 

3. Contracts with suppliers and third-party partners are used to implement appropriate measures 
designed to meet the objectives of an organization's cybersecurity program and Cyber Supply 
Chain Risk Management Plan. 

4. Suppliers and third-party partners are routinely assessed using audits, test results, or other forms 
of evaluations to confirm they are meeting their contractual obligations. 

5. Response and recovery planning and testing are conducted with suppliers and third-party 


providers, also establishing suitable information security policies. 
6. Establishment of change management procedures across the digital value/ supply chains. 


3.1.3 Key performance indicators 


In [23] key performance indicators on asset management of the Smart Grid are identified. The most 
relevant KPls are summarized and presented in Table 2. 


Table 2: Key Performance Indicators for Asset Management. 


Key Performance Indicator Description 
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Asset CAPEX CAPEX of the asset. It directly measures the 
capital expenditure and helps keeping track of 
the initial investment on grid projects 


Asset OPEX It measures the operational costs of the asset 


Asset Lifetime The expected economic lifetime of the asset. 
Elongation of economic lifetime of an asset will 
avoid high costs in the short term for the 
company. 


Automated Remote Event reading Percentage of events that are successfully read 
and detected by network-connected 
components in less than 1 minute. Reliable event 
reading is necessary for fast restoration of 
service, providing Improved Network Quality 


In [24], KPIs on the business environment of e-businesses are presented. These KPIs are evidenced by 
the visible trend of traditional businesses transitioning in a networked environment and they can be 
associated for the case of the power grid that is transitioning to the smart grid by incorporating 
information technology and adopting new strategies, products, processes and technologies. The most 
relevant KPls for the Business environment of EPES are presented in Table 3. 


Table 3: Key Performance Indicators for Business Environment. 


Key Performance Indicator Description 


Key processes This KPI indicates what key processes enable an 
organization to deliver the customer value 
proposition and achieve productivity goals. It 
can demonstrate whether an organization is 
adopting cybersecurity strategy for its key 
business processes with suppliers and customers 


Partnerships and collaborations This KPIs indicates the number of vital 
partnerships and collaborations of an 
organization. It assists in easily identifying the 
most important partners of an organization 


Risks and vulnerabilities Indicates the most important risks that an 
organization is currently facing in terms of 
cybersecurity. It assists an organization in 
identifying a taxonomy of operational cyber 
security risks as well as other risks and security 
activities 


In [25] KPIS for the cyber risk management program of a company are indicated. The most relevant 
KPIs for the use cases of SDN-microSENSE are presented in Table 4. 


Table 4: Key Performance Indicators for Risk Management. 


Key Performance Indicator Description 


IT related Incidents The number of IT security related incidents 
reported by other firms in last X months. 
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The number of security breaches identified in 
your organization 


The volume of IT System requests/traffic 
originating from unknown or malicious IP 
addresses. 


Malicious web requests 


Cybersecurity awareness The % of relevant staff trained in cyber risk and 


IT security policy and procedures. 


The % of relevant staff who have attested to 
having read and understood the IT security 


policy. 


Personnel Training levels 


The most relevant KPIs for Risk assessment, as identified in relevant peer-reviewed academic 
literature, are presented in Table 5. 


Table 5: Key Performance Indicators for Risk Assessment. 


Key Performance Indicator Description 


Risk Exposure calculation (Probability X Impact) The risk exposure is the 
product of the probability of a non-satisfactory 
result to occur, and the loss associated to this 
non-satisfactory result. An interval using the risk 
exposure value can be defined to identify high, 


medium and low risk priorities [26] 


Number of identified risks The number of risks identified in an organization. 
It helps the organizations identify risk categories 
that present more risk factors. The risks can be 
classified according to categories or taxonomies 


[27]. 


Table 6 presents the KPIs for Risk Management Strategy, as identified in peer-reviewed literature and 
online sources [28]. 


Table 6: Key Performance Indicators for Risk Management Strategy. 


Key Performance Indicator Description 


Percentage of business strategy objectives 
mapped to enterprise risk management 
strategy 


It indicates the percentage of business strategy 
objectives that are mapped to the organization 
‘s risk management strategy 


Risks mitigated in a project 


The number of risks that have been mitigated 
due to the organization’s risk management 
strategy. 


Total cost saved through mitigation 


The total cost that is saved through the 
mitigation of the risks after implementing the 
risk management strategy of the organization. 


Table 7 presents the KPIs Supply Chain Risk Management Strategy of an organization, as identified in 


peer-reviewed literature and online sources. 
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Table 7: Key Performance Indicators for Supply Chain Risk Management. 
Key Performance Indicator Description 
Supplier delivery efficiency It is a metric that dynamically identifies if a 


specific supplier is not meeting the companies’ 
target [29]. 


Expected Revenue Disruption It indicates the organization ‘s expected lost 
revenue per period from a supply chain 
disruption [30]. 


Flexibility and responsiveness It is a metric that identifies the levels of flexibility 
and responsiveness across the value chain 
indicating if the organization can absorb 
disruptions and adapt to change [31]. 


3.1.4 Identified solutions 


In recent years, Stapelberg [32] conducted a review of various asset management models and 
frameworks of infrastructure and industrial asset owners both in the public and private sectors as well 
as those of asset management service providers. He observed that asset management frameworks 
adopted by infrastructure organizations such as utilities are more inclined towards a life cycle process 
approach. The processes range sequentially from asset planning, creation, operations, maintenance to 
performance measurement. These asset life cycle frameworks incorporate risk, quality and 
environmental management to form a total asset management framework. Through his observations, 
he reached the conclusion that most asset management frameworks fail to have a system wide focus 
and that the implementation of asset management should start through the development of more 
advanced technical modelling and other analytical tools that can talk to one another. 


In [33], the author proposed an asset management framework that is built based on the principle that 
the core processes can have direct consequence on assisting an organization to achieve the “best 
value” for its stakeholders. He proposes a Strategic Infrastructure Asset Management process as a 
strategic, fully integrated approach directed to gaining the greatest lifetime utilization, effectiveness 
and value from infrastructure assets. Brown and Humphrey [34] proposed an asset management 
structure based on three pillars of competency: management, engineering, and information. The 
suggested structure can address the most pressing issues the utilities are facing: aging infrastructure, 
reliability, asset utilization, planning, automation, maintenance, project selection, and risk 
management. Murphy and Murphy [35] have identified a list of critical considerations in the way that 
companies handle electronic data transmission in their supply chain environment, as they tend to 
increasingly consider the potential impact of their supply chain process on their IT security programs. 
Furthermore, according to von Solms and J. van Niekerk [36] in terms of ICT-based systems, the 
information cannot be deemed to be secure unless all resources and processes dealing with that 
information are secure as well. They pointed out that in an organizational context, ensuring the 
security of the organization’s information is firstly a case of correctly defining the authorized entities 
for any given piece of information. Furthermore, cyber security is not only the protection of cyberspace 
itself, but also the protection of those that function in cyberspace and any of their assets that can be 
reached via cyberspace. Trim and Lee [37] identified an organizational strategic governance 
framework, by considering how policy issues underpin the development and implementation of a 
Cybersecurity strategy. They also developed a generic risk management strategy as an integral 
component of the business continuity management planning. Additionally, Henrie [38] utilized an 
exploratory case study approach to identify and discuss a cyber Security Risk Management process for 
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the SCADA Systems, which are subject to increasing risks based on technology vulnerabilities, cyber- 
threats, and system consequences. He presented a historical view of the risks and the types of systems 
which are involved and provided a deeper understanding of cyber-threats and suggestions on how to 
mitigate this expanding risk. Finally, Waithe [39] assessed the constructs and correlations of enterprise 
risk management and IT effectiveness. He addressed risk management from a holistic perspective, 
acknowledging both the strategic and tactical initiative, to ensure that risk-based decision making is 
assessed from all aspects of the enterprise environment. 


Depoy et al. [40] developed a risk assessment methodology for Physical and Cyber Attacks on Critical 
Infrastructure by combining information about the concerns for the facility under assessment, the 
asset failures, the capabilities of the adversary attacking the facility and the protective features present 
at the facility in order to produce risk estimates. A scenario-based approach to cyber risk assessment 
used by the CSSC for the National Cyber Security Division of the Department of Homeland Security of 
USA is described in [41]. In [42], Permann and Rohde developed a five-step cyber vulnerability 
assessment methodology for SCADA systems based upon the experience of assessing the security of 
multiple SCADA system. Finally, a cyber-terrorism SCADA risk framework is presented in [43]. The 
framework consists of three stages:(i) risk assessment, (ii) capability assessment model, and (iii) 
controls. Datta Ray et al. [44] proposed a unified risk management approach for the Smart grid 
security, including threat and vulnerability modeling schemes which help in identifying and 
categorizing the threats, as well as in analyzing their impacts. Katsumata et al. [45] described a 
Cybersecurity Risk Management methodology for Critical infrastructure which incorporates both 
qualitative and quasi-quantitative analyses for improved decision-making regarding effectiveness and 
ROI. Finally, Ganin et al. [46] proposed a decision framework for Cybersecurity Risk Assessment and 
Management that quantifies threat, vulnerability, and consequences through a set of criteria designed 
to assess the overall utility of cybersecurity management alternatives. The proposed framework 
bridges the gap between risk assessment and risk management, allowing an analyst to ensure a 
structured and transparent process of selecting risk management alternative. 


NIST IR 7622 on Supply Chain Risk Management [47] documents a set of repeatable and commercially 
reasonable supply chain assurance methods and practices that offer the means to obtain a greater 
level of understanding, visibility, traceability, and control throughout the ICT supply. Boyson [48] 
identified a research-based capability/maturity model for the Cyber supply chain risk management of 
critical IT systems. His model captured the spectrum of lagging, common, and best practices associated 
with Supply chain risk management. Finally, in [49], Zhengping et al. reviewed Complex Systems 
technologies for Supply Chain Risk Management , identifying that the five most relevant technologies 
for Supply chain Risk management are: (i) evolution and adaptation, (ii) game theory, (iii) complex 
networks, (iv) dynamic systems, and (v) ABS. 


The Key Performance Indicators identified for the Asset Management process are Asset CAPEX, Asset 
OPEX, Asset Lifetime, Annual infrastructure renewal and Automated Remote Event reading. In order 
to identify these values, each organization needs to understand their capital expenditures and 
operating expenses. An asset management system can provide automation into an organization. By 
identifying critical assets, organizations can target and refine investigative activities, maintenance 
plans, and financial plans at the most crucial areas. Indicatively, several asset management tools 
currently on the market are briefly described below: 


e SAP Enterprise Asset Management [50]: This software is a maintenance and asset management 
solution that manages the entire lifecycle of an organization’s physical assets. It facilitates 
maintenance scheduling, tracks and monitors assets, promotes facility management and provides 
organizations with reporting and analytics capabilities. 
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e IBM Maximo Asset Management [51]: It is an enterprise asset management software that 
supports regular asset monitoring throughout the enterprise tool. It provides near real-time 
visibility into asset usage across multiple sites, extends the useful life of equipment and, improves 
return on assets. The software can provide warning signals from assets to reduce unplanned 
downtime and increase operational efficiency. 

e Solarwinds Service Desk [52]: It is an IT asset management includes an expansive dashboard that 
aligns contracts and licenses to the assets they support, so you can easily monitor everything from 
a single location. This asset management software notifies you of potential risks and helps you 
take proactive steps to ensure all your software assets have been updated with the latest antivirus 
protection. 


Furthermore, the external business environment is a dynamic and competitive environment composed 
of numerous outside organizations. Organizations tend to track, analyze, evaluate and monitor the 
macro-environmental (external marketing environment) factors that have an impact on them. The Key 
Performance Indicators for the Business Environment of an organization, as identified in previous 
sections, are Financial Strength, Key processes, Partnerships and collaborations and Risk and 
Vulnerabilities. In order to provide details on the KPIs relating to the aspects of the business 
environment, a thorough knowledge of all internal and external factors affecting the operation of the 
company is necessary. The first four KPIs are quantifiable, while the Risk and Vulnerabilities KPI is a 
management measure that indicates the possibility of future adverse impacts and how risky an activity 
is, which can be also considered as a Key Risk Indicator. A widely used tool for the evaluation of the 
external business environment is the PESTLE analysis [53]. It is a strategic tool for understanding 
market growth or decline, business position, potential and direction for operations. As defined within 
PESTLE, the business environment can be grouped into five key sub-environments: political, economic, 
social, technological, legal and environmental. Each of these sectors might create a unique set of 
challenges and opportunities for businesses 


The Key Performance Indicators identified for governance and Risk Management (IT related incidents, 
Security Breaches, Malicious web requests, Cybersecurity awareness, Personnel Training levels) 
provide quantifiable measurements about the cyber risk management process of an organization, 
aiming to evaluate its level of protection against cyber threats. A comprehensive Risk management 
program is crucial in achieving an organization's strategic objectives. This risk management framework 
should map risks to policies, processes and regulations, while it is critical to include a comprehensive 
risk library and a smart monitoring process. A summary of the most relevant Cybersecurity Risk 
Management tools and Governance, Compliance and Risk Management software is presented below: 


e  LogicGate [54]: it is an IT and Security Risk Management platform connecting IT risk Processes 
across an enterprise. Its process automation enables organizations to transform mission-critical 
risk and compliance activities by enhancing controls, increasing flexibility, and reducing risk. 

e  LogicManager [55]: It is a successful IT risk management, security, and privacy solution consisting 
of an Enterprise Risk Management (ERM) program. It provides an effective Risk-based Approach 
for Governance Activities. 

e CURA [56]: CURA is an ERM software offering solutions in the fields of project risk management, 
enterprise risk management, operational risk management and incident risk management. It 
enables organizations to better manage risks by embedding and integrating risk management in 
business processes, linking risk management directly to decision making and by monitoring 
organizational and individual performance against goals and objectives. 

e  BitSight [57]: It is a cyber security Risk Management tool that focuses on external cyber risk 
management and optimizes an organization 's third-party risk management program. It offers a 
platform for quantifying the external cybersecurity posture of organizations using publicly 
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accessible data. Furthermore, it can evaluate the performance of an organization’s cybersecurity 
program through broad measurement, continuous monitoring, and detailed planning and 
forecasting in an effort to measurably reduce cyber risk 


The establishment of a risk assessment framework which identifies, analyses and evaluates risks is an 
efficient way to ensure that the cyber security controls are appropriate to the risks that an organization 
faces. The quantifiable KPls of Risk Assessment identified in previous sections are Risk Exposure 
calculation and Number of identified risks. They can identify the risk priorities of an organization and 
the risk categories which present more risk factors for an enterprise. Year-round cybersecurity risk 
assessments are possible thanks to SaaS platforms which offer continuous monitoring, automated 
testing, and user-friendly dashboards and reports. Existing software tools that can be used for this 
cybersecurity phase are listed below: 


e Solarwinds Access Right Manager [58]: It is a Cybersecurity Risk management and assessment tool 
which It analyzes and audits access across files, folders, and servers of an organization and helps 
enforcing cybersecurity policies with automated secure account provisioning. It provides a central 
place for IT compliance management and assesses the security risks of an organization such as user 
authorizations and access permissions to sensitive data. 

e  vsRISK Cloud [59]: It is an online tool for conducting an information security risk assessment. This 
tool can create scenario-based risks, enabling users to choose the risk assessment or data 
protection impact assessment methodology that best suits their organisation's circumstances. 
vsRISK is aligned with ISO/IEC 27001:2013, NIST SP 800-53 and CSA CCM v3. 


A risk management strategy provides a structured and coherent approach to identifying, assessing and 
managing risks in organizations. Nowadays, the implementation of a risk management strategy is 
fundamental to effective corporate governance. It aims in proactively identifying and understanding 
the factors and events that may impact the achievement of strategic and operational objectives, 
followed by the management, monitoring and reporting of these risks. The identified KPIs (Percentage 
of business strategy objectives mapped to enterprise risk management strategy, Risks mitigated in a 
project and of Total cost saved through mitigation) for the risk management strategy are measurable 
KPls which evaluate the effectiveness of the risk management strategy of the organization and 
demonstrate the benefits of the implementation of a risk management program in an organization. 


Risk management plays a vital role in effectively operating supply chains of an organization in the 
presence of a variety of uncertainties [60]. Supply chain risk management has now been heavily 
deployed across services organizations of all types. In recent years, the concept of Cyber supply chain 
risk management (CSCRM) has arose. CSCRM is an emerging management discipline resulting from the 
fusion of approaches, methods, and practices from the fields of cybersecurity, enterprise risk 
management, and supply chain management [48]. It focuses on gaining visibility and control not only 
over the focal organization but also over its extended enterprise partners. In this respect, the KPIs used 
for the Supply Chain Risk Management (SCRM) section are Supplier Delivery efficiency, Expected 
Revenue Disruption and Flexibility and responsiveness. These measurable KPIs can indicate a supplier’s 
efficiency, identify the flexibility across the supply chain of the organization and evaluate the 
organization lost revenue stemming from a supply chain disruption. There are numerous commercial 
tools for supply chain risk management which aim to provide comprehensive and real-time insights on 
an organization supply chain and prevent issues of supply chain disruption. Some commercially 
available SCRM tools are briefly described below: 


e anyLogistix [61]: It is a SCRM software which allows users to replicate a supply chain network and 
simulate its operations, making allowance for the uncertainties of the real-world operation. 
Simulation modelling is used to demonstrate how instability can affect the supply chain operations 
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of an organization. Risks can be assessed quantitatively, calculating both the event’s probability 
and its associated losses. 

e Coupa Risk Aware [62]: It is a Supply chain risk management software which dynamically scores 
suppliers based on supplier behavior and third-party data on credit, restricted party and other 
risks. It combines financial, judicial, and news sentiment risk scores with community-rated scores 
to provide a risk assessment. 

e MéetricStream Supplier Risk and Performance Management Solution [63]: This software provides 
organizations with an efficient tool for managing, monitoring and tracking multiple stages of their 
supplier relationships across the global supplier network. MetricStream solution provides 
organizations with enhanced awareness of their supply chain security and risks. By providing 
insights on the supply chain, it improves business resilience through streamlined supplier risk 
assessments and supports decision-making of the organization. 


3.2 Protect 


This function refers to the development and implementation of suitable safeguard, which ensure the 
delivery and availability of critical services. Accordingly, the function focuses on components, 
operations and service that limit or contain the impact of a cybersecurity events. 


3.2.1 Background on the function 


The NIST framework identifies the following categories of cybersecurity solutions which are relevant 
to the Protect function: 


e identity Management, Authentication and Access Control: Access to physical and logical assets and 
associated facilities is limited to authorized users, processes, and devices, and is managed 
consistent with the assessed risk of unauthorized access to authorized activities and transactions. 

e Awareness and Training: The organization’s personnel and partners are provided cybersecurity 
awareness education and are trained to perform their cybersecurity-related duties and 
responsibilities consistent with related policies, procedures, and agreements. 

e Data Security: Information and records (data) are managed consistent with the organization's risk 
strategy to protect the confidentiality, integrity, and availability of information. 

e Information Protection Processes and Procedures: Security policies (that address purpose, scope, 
roles, responsibilities, management commitment, and coordination among organizational 
entities), processes, and procedures are maintained and used to manage protection of information 
systems and assets. 

e Maintenance: Maintenance and repairs of industrial control and information system components 
are performed consistent with policies and procedures. 

e Protective Technology: Technical security solutions are managed to ensure the security and 
resilience of systems and assets, consistent with related policies, procedures, and agreements. 


3.2.2 Theoretical background 


Digital information, such as confidential files, contract and plans, state secrets, health and other 
records which may be stored online are crucial for the operation of modern institutions. In this respect, 
Identity Management is a vital part of every institution’s security plan as it protects the information 
against the rising threats of hacking, phishing, ransomware, and other malware cyber-attacks, while 
granting authorized people easy access to the very same data. Identity management consists of one 
or more processes to verify the identity of a subject attempting to access an object. Access control is 
a security technique that can be used to regulate who or what can view or use a resource environment, 
whereas Identity is a set of attributes related to an entity that computer systems use to represent a 
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person, organization, application, or a device. Access to physical and logical assets and associated 
facilities is limited to authorized users, processes, and devices, and is managed consistently with the 
assessed risk of unauthorized access to authorized activities and transactions. In fact, there is a direct 
relationship between access control and identity management as the core function of an identity 
management solution is access control. The processes for identity management and access control can 
be summed up as follows [1]: 


1. Identities and credentials are issued, managed, verified, revoked, and audited for authorized 
devices, users and processes 

2. Physical access to assets is managed and protected 

Remote access is managed 

4. Access permissions and authorizations are managed, incorporating the principles of least privilege 
and separation of duties 

5. Network integrity is protected (e.g., network segregation, network segmentation) 

Identities are proofed and bound to credentials and asserted in interactions 

7. Users, devices, and other assets are authenticated (e.g., single-factor, multi-factor) commensurate 
with the risk of the transaction (e.g., individuals’ security and privacy risks and other organizational 
risks) 


E 


o 


Furthermore, an organization's ability to address cyber security risks is largely influenced by its internal 
capabilities, and the way they are equipped to prevent and manage cyber-attacks and incursions. An 
organization should train and prepare its members of staff to withstand, and respond to, the threats 
posed by cyber-attacks. The organization's personnel and partners are provided cybersecurity 
awareness education and are trained to perform their cybersecurity-related duties and responsibilities 
in consistence with related policies, procedures, and agreements through the following process: 


1. All users are informed and trained 

2. Privileged users understand their roles and responsibilities 

3. Third-party stakeholders (e.g., suppliers, customers, partners) understand their roles and 
responsibilities 

4. Senior executives understand their roles and responsibilities 

5. Physical and cybersecurity personnel understand their roles and responsibilities 


In addition, NIST addressed also specific security requirements like authorization, identification, 
authentication, trust, access, control and privacy [64]. Khatoun et al. [65] propose some taking over 
actions to ensure the data security such as the encryption of network traffic with robust symmetric 
algorithms such as AES and Blowfish. Shielding data from security threats is more important today than 
ever before. In order to manage consistently data and protect the confidentiality, integrity, and 
availability of information, an organization needs to make sure that [1]: 


Data-at-rest is protected 

Data-in-transit is protected 

Assets are formally managed throughout removal, transfers, and disposition 

Adequate capacity to ensure availability is maintained 

Protections against data leaks are implemented 

Integrity checking mechanisms are used to verify software, firmware, and information integrity 
The development and testing environment(s) are separate from the production environment 
Integrity checking mechanisms are used to verify hardware integrity 


po. gy Ur de QN [S 


Compliance effectiveness is largely based on how well the data of an organization is protected; it is an 
indicator that needs to be continuously monitored and reviewed. The organizations need to review 
their data protection programs regularly to ensure that they establish compliance with their target 
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values and regulations. Some sub-metrics that can be measured for the KPI of compliance is the MTBF, 
which indicates the number of days that a company has operated without system failure, and the 
MTTR, which estimates the mean number of hours a company need to fix data security issues and 
restore its working condition. Due to sheer volumes of data in an organization’s environment, 
monitoring is becoming increasingly difficulty. Monitoring the majority of sensitive data adds an extra 
layer of security for the organization; the percentage of sensitive data that is being monitored can be 
calculated in real time, while, based on this knowledge, organizations can apply audit logs and 
additional logic for identifying and alerting on anomalous access to sensitive data. The number of 
Customer data incidents and Customer data related complaints are also metrics that should be 
measured and monitored constantly by the organizations, as it is crucial for the organization to 
maintain a positive customer experience and assure that customer data is not vulnerable to criminals. 
Finally, the financial impact of Data incidents and IT security breaches shall be evaluated as it is critical 
for an organization, in terms of lost or stolen data, customer mistrust, legal investigations, and 
recovery efforts. As suggested in [66], the estimation of this financial impact can be done by comparing 
the market value of the organization before and after an event. In this case, an event is defined as an 
announcement about an organization’s security breach in a major newspaper, while the estimation of 
the financial impact compares the market value of the company on the day before the event to its 
market value on the day after the event. 


Organizations must ensure that proper processes and procedures are in place to manage the 
protection of information systems and assets. Misconfigured and vulnerable systems and 
unauthorized network changes can leave the network vulnerable to compromise and data leakage. 
However, proper configuration, change, and vulnerability management are notoriously difficult to 
implement and maintain. Security policies (that address purpose, scope, roles, responsibilities, 
management commitment, and coordination among organizational entities), processes, and 
procedures are maintained and used to manage the protection of information systems and assets. The 
NIST Cybersecurity Framework [1] provides a set of objectives that will assist an organization in building 
a comprehensive security plan, measuring the effectiveness and improving protection processes and 
procedures: 


1. A baseline configuration of information technology/industrial control systems is created and 
maintained incorporating security principles (e.g. concept of least functionality) 

2. ASystem Development Life Cycle to manage systems is implemented 

3. Configuration change control processes are in place 

4. Backups of information are conducted, maintained, and tested 

5. Policy and regulations regarding the physical operating environment for organizational assets are 

met 

Data is destroyed according to policy 

Protection processes are improved 

Effectiveness of protection technologies is shared 

Response plans (Incident Response and Business Continuity) and recovery plans (Incident Recovery 

and Disaster Recovery) are in place and managed 

10. Response and recovery plans are tested 

11. Cybersecurity is included in human resources practices (e.g., deprovisioning, personnel screening) 

12. A vulnerability management plan is developed and implemented 


19° 00: c4 


To ensure that maintenance and repairs of industrial control and information system components are 
performed consistent with policies and procedures an organization: i) establishes a process for 
maintenance personnel authorization and maintains a list of authorized maintenance organizations or 
personnel ii) ensures that non-escorted personnel performing maintenance on the information system 
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have required access authorizations and iii) designates organizational personnel with required access 
authorizations and technical competence to supervise the maintenance activities of personnel who do 
not possess the required access authorizations. To maintain the highest level of system availability and 
protect its infrastructure, an organization should [1]: 


1. Perform and log the maintenance and repairs of organizational assets, with approved and 
controlled tools 

2. Perform maintenance operations at predetermined, authorized times or on an approved, as- 
needed basis 

3. Develop and sustain maintenance policies and procedures to facilitate the implementation of the 
information system security maintenance requirements and associated system information 
system security maintenance controls 

4. Perform and log remote maintenance of organizational assets in a manner that prevents 
unauthorized access 


Additionally, organizations must deploy protective technology to ensure cyber resilience. Technical 
security solutions are managed to ensure the security and resilience of systems and assets, consistently 
with related policies, procedures, and agreements. To this purpose, an organization should establish 
and maintain an information security program [1] in which: 


1. Audit/log records are determined, documented, implemented, and reviewed in accordance with 
policy 

2. Removable media is protected, and its use restricted according to policy. 

3. The principle of least functionality is incorporated by configuring systems to provide only essential 
capabilities. 

4. Communications and control networks are protected. 

5. Mechanisms (e.g., failsafe, load balancing, hot swap) are implemented to achieve resilience 
requirements in normal and adverse situations. 


3.2.3 Key performance indicators 


Effective Identity and Management processes are integral to driving business value reducing risk, while 
security metrics for Identity Management & Access Control are important as they provide the basis for 
management decisions that affect the protection of the infrastructure. KPIs for Identity Management 
& Access Control, as identified in literature, are summarized and presented in Table 8. 


Table 8: Key Performance Indicators for Identity Management & Access Control. 


Key Performance Indicator Description 


Reachability count It is a metric that indicates the number of access points 
(relative to a specific point of origin such as the 
Internet). A key assertion is that a reduction in the 
number of access points tends to reduce the cyber 
security risk. It can be calculated as follows 


NT = Ns + No + Np 
Where, 


Ns = Number of ports (services) that respond to data 
transmitted from the point of origin. 
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No = Number of machines that have network 
connectivity from inside the network to the point of 


origin. 


Np = Number of physical access points to unrestricted 
portable storage media drives. 


Password reset volume per month It indicates the number of password resets performed 
monthly. It is key to helping organizations measure the 
effectiveness of their identity and access management 
programs 


Number of new accounts provisioned This KPI identifies the number of new accounts that are 
provisioned inside an organization. 


Number of security incidents due to | This KPI identifies the number of security incidents due 
critical role and access right | to critical roles assigned to personnel and user access 
combinations rights. 


Furthermore, the goal of a security awareness program is to heighten the importance of information 
systems security and the possible negative effects of a security breach or failure. In this training 
environment, an employee is expected to be an active participant in the process of acquiring new 
insights, knowledge, and skills. Various metrics can help organizations determine the most efficient 
and economical solutions for their training needs. The most relevant KPIs for Awareness and Training, 
as identified in literature and KPI repositories, are summarized and presented in Table 9. 


Table 9: Key Performance Indicators for Awareness and Training. 


Key Performance Indicator Description 


Number and type of security incidents | This KPI is used to track the number and type of security 
before and after awareness campaign | incidents that occur before and after the awareness 
campaign [67]. This KPI may indicate whether the users 
know what to do and whom to contact if they suspect a 
computer security breach or incident 


Training Methodology It indicates what methods are used to deliver training to 
the employees. (i.e. Instructor-led, Peer-mentored, self- 
study etc.) 

Training Penetration Rate This KPI measures the percentage of employees 


completing a course or a content area of training 
compared to total number of employees employed. It 
identifies the percentage of employees that have 
completed a specific training program 


Percentage of employees’ satisfaction | This percentage provides the satisfaction rate of the 
with training employees in regards with an awareness or training 
campaign 


With respect to data security metrics and measures, these can help organizations to (i) verify that their 
security controls are in compliance with a policy, process, or procedure; (ii) identify their security 
strengths and weaknesses; and (iii) identify security trends, both within and outside the organization’s 
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control [68]. Data security includes data encryption and key management practices that protect data 
across all operational layers of an organization. A list of KPIs for Data Security is presented in Table 10. 


Table 10: Key Performance Indicators for data security. 
Key Performance Indicator Description 


Percentage of Sensitive Data Monitored | This KPI tracks the percentage of sensitive data inside 
for Anomalous Access an organization that are being monitored for 
malicious attacks [69]. 


Compliance It indicates if an organization’s data inventory is 
following data compliance regulations. The 
organization needs to continually review and update 
its policies to ensure compliance. 


Customer Data Incidents It counts how many and what type of incidents related 
to customer data losses have occurred in the firm [70]. 


Financial Impact of Data Incidents This KPI measures the total cost of discovery, 
response, and company value loss after a data 
incident. 

Customer Data Related Complaints The amount of customer complaints that are related 


to data and privacy concerns. 


The Key Performance Indicators for protection on information processes and procedures could be 
mapped with how well an organization apply the standards presented earlier. Analytically, Sani et al. 
[71] state that the performance indicators should meet the need for reliability, availability, security 
and maintainability of the data flow. G. Dondossola and R. Terruggia [72] also suggest some 
performance metrics on securing information processes and procedures and they are presented in 
Table 11. 


Table 11: Key Performance Indicators for Information Protection Processes and Procedures. 
Key Performance Indicator Description 


Handshake time The amount of time needed to establish connection on 
different communication levels 


Round Trip Time - measurements | The amount of Time needed between the output of a 
measurement and the reception of the 


corresponding Transmission Control Protocol acknowledged 
by the Distributed Energy Resource. 


Inter-Measurements Time The amount of time needed between two consecutive 
measurements 

Inter-Setpoint Time The amount of time needed between two consecutive 
setpoints 

Round Trip Time - Setpoint The amount of time needed between the output of a setpoint 


request and the reception of the 


corresponding TCP ack by the MVGC 
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With respect to maintenance, the performance indicators should address the needs of system 
availability and protection through high speed and low-cost maintenance performance. In [73] are 
presented useful KPIs utilized to evaluate maintenance mechanisms used in EPES. In Table 12 these 
KPIs are summarized. 


Table 12: Key Performance Indicators for Maintenance. 


Key Performance Indicator Description 

Reliability index Represents how well an organization’s assets is performing 
compared to those of its peers. 

Maintenance cost The amount of money spent to maintenance activities [74]- [75]. 

Root Mean Square Error The square root of the mean of the square of all the errors [76]. 

R(t) The average non discounted return [76] 

o(R(t)) The standard deviation of the average non discounted return 
[76]. 

ENS The average value of the energy not supplied [76]. 

o(ENS) The standard deviation of the average value of the energy not 
supplied [76]. 


As for protective Technology, in [77] Key Performance Indicators for protective technologies are 
proposed. The proposed KPIs take into consideration the number of the customers, the voltage 
operation of the grid, the topology of the grid (mesh, radial) and the type used for protection 
(overcurrent). Furthermore, the proposed KPIs take into consideration the percentage of photovoltaic 
penetration, the average size of DER Resources and the location of the PV feeder. In all, the proposed 
evaluation metrics are Loss of Load, the Stability of the Grid and the Safety of the Grid. 


3.2.4 Identified solutions 


Security and privacy issues of the Smart grid have been widely discussed in the literature. In [78], 
Yanliang et al. implement Smart Grid security as a service, with all communication and data being 
passed through their access control and intrusion detection service. Furthermore, Wang et al. in [79] 
state that resilience, reliability, and sustainability of a power grid could be improved significantly by 
separating the large grid into networked micro grids. They formulate and present also a solution for 
cybersecurity enhancement based on Blockchain and Directed Acyclic Graph, aiming to improve 
network reliability and have higher security and eliminating the financial fraud. The National 
Cybersecurity Center of Excellence of the United States of America [64] presents a solution for identity 
management and access control on standards-based technical approach that unifies IdAM functions 
across OT networks, PACS, and IT systems. Decusatis et al. [80] also present a Decentralized Energy 
Resource Management Using the Ethereum Blockchain in order to achieve better results in access 
control and identity management. This technique proposes an approach to digital identity 
management which require smart meters to authenticate with the blockchain ledger and mitigate an 
identity-spoofing attack. 


Reachability count has been defined as the number of access points (relative to a specific point of 
origin such as the Internet). A key assertion for this KPI is that a reduction in the number of access 
points tends to reduce the cyber security risk. In order to measure this KPI, the complete network 
configuration information is required (including connectivity and firewall rules). The systems of an 
organization can be scanned to identify all network communication paths. Knowledge of information 
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about physical access to computer ports is also needed; the physical access to portable storage media 
drives can be done by inspection. Password reset volume per month, Number of new accounts 
provisioned and Number of security incidents due to the critical role and access right combinations can 
be easily estimated by log keeping of the password resets performed, provisioned accounts and 
security incidents respectively. There are plenty of commercial tools dedicated to Identity 
Management and Access Control. Among others, Microsoft Azure [81] can provide identity and access 
management control for both hybrid and cloud environments. IBM Security Identity and Access 
Assurance [82] offers a complete identity and access management platform built to help strengthen 
compliance and reduce risk by protecting and monitoring user access in multi-perimeter 
environments. Finally, RSA SecurlD Suite [83] offers products and services for cyber threat detection 
and response, identity and access management, online fraud prevention, and business risk 
management. It combines access management and authentication with identity governance and user 
lifecycle management. 


Additionally, Several Institutes and researches raised awareness topics on Cybersecurity in power 
grids. The US-NIST in [84] presented an overview of ICS threats and vulnerabilities, recommending 
adequate countermeasures and policies. NERC and IEC [85] have published recommendations for 
infrastructure protection for electric production and distribution. Nagarzan et al. [86] suggested a 
framework aiming to teach everyday users the requisite cybersecurity skills through engaging, 
entertaining and educational games. Moreover, Khatoun et al. [65] suggested that empowering staff 
within the organization through Awareness and Training could be implemented by the comprehensive 
training program for developers and administrators, by alerting and advising users about where there 
are threats and last but not least by embed continuity plans and disaster recovery. 


Although contemporary technologies allow the collection of extensive amounts of data, for these to 
be used to their full potential, security and privacy are critical [87]. Data security is a set of standards 
and technologies that protect data from intentional or accidental destruction, modification or 
disclosure. Its primary aim is to protect the data that an organization collects, stores, creates, receives 
or transmits. It is crucial for the operation of an enterprise as data breaches can result in litigation 
cases and huge fines, not to mention damage to the reputation of an organization. It is therefore 
essential to keep the data flow secure and continuous. In order to achieve this, cybersecurity of power 
grids should meet the fundamental requirements of confidentiality, availability and integrity. As Li et 
al. state in [88], confidentiality refers to protecting the data from being accessed by unauthorized 
users. Availability refers to guaranteeing the data are accessible and timely. Integrity refers to assuring 
the data are accurate and trustworthy. Several architectures have been implemented in compliance 
with these requirements. He et al. in [89] presented an efficient DoS resistant broadcast authentication 
mechanism to secure Drip protocol. Demertzis et al. in [90] proposed a system for the cybersecurity of 
smart energy grids. Bretas et al. in [91] presented a system to handle malicious data attacks. Data 
security has become essential for every enterprise. Role-based access control is one method that can 
keep data secure and allows the organizations to provide specific accesses to users, based on their role 
in an organization. Furthermore, numerous software solutions for Data Security of organizations are 
commercially available. Indicatively, Kaspersky Endpoint Security [92] eliminates vulnerabilities, helps 
in preventing loss or theft of confidential business data and uses encryption to prevent data being 
accessed by cybercriminals. IBM Security Guardium [93] protects all types of data from growing 
threats across diverse on-premises, hybrid, and public cloud environments, by using data activity 
monitoring and alerting, encryption, blocking, masking and advanced data security analytics. Finally, 
Check Point Data Loss Prevention [94] preemptively protects organizations from unintentional loss of 
valuable and sensitive data, while ensuring compliance with legislations and standards. 
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Several approaches have been utilized for maintenance implementation with respect to cybersecurity. 
Rafiei et al. [73] proposed a novel approach for smart grid maintenance. In order to overcome the 
limitations due to the hidden functions of protection systems, the Reliability Centered Maintenance is 
applied to the whole protection system and calculates the reliability of the protection system(s). 
Rocchetta et al. [76] developed a reinforcement-learning framework to manage optimal the 
maintenance and the operation of the smart-grids. It is also equipped with health management 
capabilities and prognostics. Sadeghian et al. [74] proposed a multi objective generation maintenance 
scheduling optimization model for maintenance scheduling of generation units based on the global 
criterion approach, adopting a suitable compromise function. Zhou et al. in [75] presented a 
maintenance system based on multi dimension data analysis and fault prediction from the state of the 
equipment. Nagarajan et al. [95] presented a routing algorithm for identifying locations. Finally, Wu 
et al. [96] presented a neutral online visualization-aided autonomous evaluation framework for 
evaluating machine learning and data mining algorithms for preventive maintenance of the power grid. 
In [73], the reliability Index and maintenance cost were evaluated by applying real historical data of a 
smart grid distance protection system and by utilizing the Hardware-In-the-Loop (HIL) real-time 
simulation approach. The Root Mean Square Error, the average non discounted return, the standard 
deviation of the average non discounted return, the average value of the energy not supplied, and the 
standard deviation of the average value of the energy not supplied KPI’s were evaluated in [76], by the 
comparison of the proposed Reinforcement Learning Framework and the Bellman’s optimally. 


With respect to protective technologies, Qi et al. in [77] propose a holistic attack resilient framework 
to protect the integrated distributed energy resources and the power grid infrastructure from 
malicious cyber-attacks. Jahan et al. in [97] implement a real time smart grid scenario simulating 
attacks to study the security challenges, as the cybersecurity is the concept to make sure that the grid 
has the capability to monitor and analyze changing conditions. Balda et al. in [98] propose a new power 
grid architecture based on E-LAN for better security results. 


3.3 Detect 


This function as specified by NIST integrates the development and implementation of activities 
relevant to the detection of cybersecurity event occurrences, enabling their timely discovery. 


3.3.1 Background on the function 


The NIST framework identifies the following categories of cybersecurity solutions which are relevant 
to the Detect function: 


e Anomalies and Events: Anomalous activity is detected, and the potential impact of events is 
understood. 

e Security Continuous Monitoring: The information system and assets are monitored to identify 
cybersecurity events and verify the effectiveness of protective measures. 

e Detection Processes: Detection processes and procedures are maintained and tested to ensure 
awareness of anomalous events. 


3.3.2 Theoretical background 


The transition from today’s power systems to the smart grid will be a long evolutionary process, 
however incorporating augmenting challenges with respect to cybersecurity. There is an increasing 
inherent difficulty of achieving all-encompassing component level security in power system IT 
infrastructures due to its cost and potential performance implications on the uninterruptible balancing 
of demand and generation. However, there is strong potential to improve the security of the system 
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by leveraging the knowledge of the physical processes and the significant amount of redundant 
information i.e. state estimation, load forecasting [99]. 


Temporal anomalies in the substation facilities have been on the spot of cybersecurity efforts, e.g., 
user-interfaces, IEDs and circuit breakers [100]. Remote access to a substation network from corporate 
offices or locations external to the substation is not uncommon for control and maintenance purposes. 
Dial-up, VPN, and wireless are available mechanisms between remote access points and the substation 
LAN. Indicative anomaly attempts include (a) intrusion attempt, (b)change of the file system, (c) change 
of target’s system settings/status and they are mainly expressed through intrusions in the (i) Network 
communication protocols, (ii) IEDs protective relays, Circuit Breakers status and merging units, (iii) user 
interface HMI and engineering units, (iv) firewall logs and rules [101]. 


Possible intrusions to the substation communication network can originate from outside or inside a 
substation network. An Inside attack can be for example, if a USB is already infected by an attacker, it 
may be used to install malware on the substation user-interface. Then it may be used to open a 
predefined communication port or execute hacking tools. An Outside attack can be for example, 
remote access points may be used for maintenance, control or operation. Once an intruder 
compromises the access points, the attack may be able to pass the firewall and gain access to the 
substation ICT network [102] [103]. 


Other anomalies can happen by data integrity attacks i.e. fabricated data packet that instructs the relay 
to trip while not needed or delay the execution time of specific code needed to run fast for balancing 
system operation. In this area, load forecasting data anomalies include (i) distorting the load data with 
specific ramping function (ramping attack) (ii) replacing the set of contiguous data points in the original 
time series data with a set of new values that will formulate a smooth curve together with neighboring 
data points in the original data (smooth-curve based attack), (iii) modification of the output data based 
on the output of the forecasting models with the falsified input data (Forecasting Model Misuse). (iv) 
changes to the coefficients in a regression load forecasting model (Forecasting Model attack) [104]. 


With respect to continuous monitoring of security, the rapid evolution and utilisation of ICT services in 
EPES render necessary the presence of appropriate security monitoring and auditing solutions. SIEM 
systems constitute a technology that dominates the scene. In particular, SIEM systems have the ability 
to deploy multiple agents in a hierarchical manner to aggregate, normalise and correlate information 
and security events from different resources, such as security-related events from end-user devices, 
servers, network devices and operating systems [105], [106]. Moreover, they can integrate various 
security mechanisms, such as firewall, availability monitoring, asset discovery, vulnerability 
assessment and intrusion detection in order to analyse logs and issue alert notifications or perform 
another response when a cyberattack or malware is detected. Furthermore, these systems are 
characterised and evaluated by the following features/capabilities: a) data sources supported, b) data 
sources capabilities, c) processing capability, d) flexibility in security directives, e) behaviour analysis 
at application level, f) risk analysis capability, g) resilience, h) security event management and 
visualisation, i) reaction capability, j) deployment and support and k) licensing. Deliverable D2.1 of 
H2020 DiSIEM project ("In-depth Analysis of SIEMs extensibility") [107] evaluates a variety of SIEM 
tools, including HP ArcSight, IBM QRadar, Intel McAfee Enterprise Security Manager, Alienvault OSSIM 
and Unified Security Management (USM), XL-SIEM, Splunk and Elastic Stack based on the 
aforementioned criteria. Moreover, in [108], R. Leszczyna and M. R. WrA, sbel assess three open-source 
tools, namely AlienVault OSSIM, Cyberoam iView and Prelude SIEM for the smart electrical grid. Based 
on the authors' quantitative analysis, AlienVault OSSIM presents the best performance. Table 13 
summarises some both proprietary and non-proprietary SIEM. 
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Table 13: Summary of existing proprietary/non-proprietary SIEM tools. 


Tool 


IBM QRadar 
SIEM [109] 


Licensing 


Proprietary 


Functionality 


Log management; analytics; intrusion detection; data 
collection; risk modelling analytics to emulate attacks; insider 
threat detection; sense analytics 


McAfee Enterprise 
Security Manager SIEM 
[110] 


Proprietary 


Runs via active directory with the focus on system security; 
compiles and correlates disparate data 


RSA NetWitness 
Suite [111] 


Proprietary 


Extensive tools; automatically detect anomalous data 
patterns; adaptable; multiple use cases 


Splunk Enterprise Security 
[112] 


Proprietary 


Network and machine data; combines log management with 
network analysis 


ArcSight Enterprise 
Security Manager [113] 


Proprietary 


Compile log of big data; security orchestration; multi-tenancy 
& unified access matrix 


LogRhythm [114] 


Proprietary 


Behavioural analysis; log correlation; Artificial Intelligence; 
diverse log types; threat management; network and system 
threat management; cybercrime detection 


SolarWinds Security Event 
Manager [115] 


Proprietary 


Graphical data visualization; access to industry support 


Trustwave Enterprise SIEM 
[116] 


Proprietary 


Suitable for diverse ICT infrastructure organizations; 
automated analysis by a cloud engine; unified data storage of 
logs; events; alerts; findings and incidents; threat 
management 


Tenable Log Correlation 
Engine [117] 


Proprietary 


Cloud-based Virtual Machine (VM) platform; user resource 
tracking; measured by assets instead of IP addresses; 
vulnerability management; container security; web 
application scanning 


Sumo Logic [118] 


Proprietary 


Control over full application and infrastructure stack; 
troubleshoot in real time; applications can be built; run and 
secured by users; log management and time series metrics; 
detect and predict 


VMWare Log 
Insight [119] 


Proprietary 


Heterogeneous and scalable log management; faster 
troubleshooting across physical and virtual environment; 
handles machine logs, network traces; configuration file 
messages; system state dumps; application logs; built-in 
vSphere knowledge 


EventTracker [120] 


Proprietary 


Threat intelligence integration; forensic analysis; system 
threat identification; vulnerability scan 


Loggly [121] 


Proprietary 


Proactive real time log monitoring; app performance tracking; 
system behaviour monitoring; config management; web 
services management; big data infrastructure support 


Xpolog [122] 


Proprietary 


Network errors and security risk identification; track system 
problems; fix malfunctions; agent-less technology 
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Access control; security management; workload migration; 
NetIQ Sentinel [123] Proprietary | disaster recovery; VoIP for unified communication; hybrid 
environment support 


Vulnerability management; patch management; security 


SecureVue Cloud [124 Proprietar EE : 
[24] p Y monitoring; Co-SIEM with Splunk 
Rone A framework to unify open source SIEM platforms; 
Prelude [125] y multisource event logs; filtering; correlation, analysis; 
Proprietary À SE 
visualization 
Non Intrusion detection, behavioural monitoring, vulnerability 


OSSIM [126] . assessment, open threat exchange portal, Asset discovery & 
Proprietary inventory 


Furthermore, an effective countermeasure for the overall protection of EPES is to timely detect the 
possible cyberthreats such as malware, cyberattacks and anomalies utilizing IDS. In particular, the goal 
of an IDS is to detect possible attacks and anomalies either by timely informing the system operator or 
the security administrator or performing some countermeasures. The typical architecture of IDS, as 
illustrated in Figure 5: IDS Architecture. Figure 5 is able to monitor the network traffic generated by 
many devices. Based on this discrimination, the IDS systems can be classified into two categories: 1) 
HIDS (Host Based) and 2) NIDS (Network Based). Accordingly, the Analysis Engine receives the 
information collected by the agents and tries to detect possible cyberattack or anomaly patterns. The 
detection mechanisms applied by the analysis engine can be classified into three categories: a) 
signature-based, b) anomaly-based and c) specification-based. The first one matches the information 
collected by the agents with known and verified attack signatures. The second category attempts to 
identify possible anomalies in behavior by adopting statistical analysis and Al techniques, in order to 
compare normal profiles (e.g. from a dataset) with the observed events. Based on a threshold it 
decides whether there is an anomaly or not. The last category matches the information collected by 
the agents with a set determining the legitimate behaviors. Finally, the Response Module informs the 
responsible administrator about the possible cyberattacks and anomalies and also in some cases, it 
can perform appropriate preventing actions. Subsequently, various IDS systems devoted to protecting 
EPES are analyzed. 
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Figure 5: IDS Architecture. [127] 
3.3.3 Key performance indicators 


The following table organises the most usual and useful KPls utilised to evaluate a detection 
mechanism such as an IDS system. Before analysing these metrics, the following terms should be 
explained. True Positive (TP) is counted as the quantity of the correct classifications that identified the 
cyberattacks as abnormal behaviour. On the other hand, True Negative (TN) is identified as the number 
of correct classifications that recognised non-malicious activities as normal behaviour. Accordingly, 
False Positive (FP) is considered as the number of mistaken classifications that recognised normal 
activities as malicious behaviour. Finally, False Negative (FN) is deemed the number of mistaken 
classifications that recognised cyberattacks as normal behaviour. Based on the aforementioned terms 
and [127], the following metrics are defined. 


Table 14: Key performance indicators for detection. 
Accuracy (ACC) 


Definition ACCS TP + TN 
© TP+TN + FP +FN 


Description ACC denotes the proportion between the correct predictions and the total 
number of samples. ACC is considered an effective metric when there is an 
equivalent number of samples between the predefined classes. For example, if 
a training set consists of 98% normal behaviour samples and 2% malicious 
behaviour samples, then the training accuracy of the classification model can 
easily approach 98%, classifying each case as normal behaviour. On the other 
hand, if the training set consists of 60% normal behaviour samples and 40% 
malicious behaviour samples, then the training accuracy might be decreased at 
60%. Therefore, in some cases, ACC can trick a security operator or the security 
administrator by giving the mistaken sense of achieving high classification ACC. 


Precision 


Definition n TP 
© TP+FP 


Description Precision implies what proportion of samples that are classified as malicious 
behaviour, indeed present a malicious behaviour. Consequently, Precision 
provides information regarding the performance of the classification with 
respect to FP. 


True Positive Rate (TPR) 


Definition TP 
TPR = ——_ 
TP + EN 


Description TPR calculates what proportion of intrusions that truly present a malicious 
behaviour was classified as an intrusion. In contrast to Precision, TPR provides 
information concerning FN. It is noteworthy that TPR is also called as Recall and 
Sensitivity. 
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True Negative Rate (TNR) 
initi TN 
Definition TNR = 
TN + FP 
Description TNR is calculated as the division between TN and the sum of TN and FP, 


identifying the proportion of normal behaviours that were classified as normal. 
In other words, TNR is the opposite of TPR. In some cases, TNR is also named as 
Selectivity or Specificity. 


False Positive Rate (FPR) 
Definition 
FPR = ———_=1-—-TNR 
FP + TN 
Description FPR is the opposite of TNR, indicating the proportion of normal behaviours that 


are classified as intrusions. In particular, FPR or differently Fall-Out is defined as 
the fraction between FP and the sum of FP and TN. 


False Negative Rate (FNR) 


Definition 


FPR = ——— —— = 1 — TPR 
FN + TP 
Description FNR is the opposite of TPR, identifying the proportion of intrusions that are 
classified as normal behaviour. More specifically, FNR is calculated by dividing 
FN with the sum of FN and TP. 
F1 Score 
Definition E 2 x (Precision x Recall) 
— (Precision + Recall) 
Description The F1 score represents the balance between the Precision and TPR, thus 


considering both FP and FP. F1 is defined as the weighted average of Precision 
and TPR and provides a performance indication for the anomaly detection 
mechanisms. Usually, F1 is more efficient than ACC, mainly in cases of uneven 
class distributions. 


Area Under Curve (AUC) 


Definition 


1 


AUC = | TPR(FPR-(x))dx 
=0 


| 


= Í ` TPR(T)FPR'CT)dT 


- | ` Í IT > DAP) f(T)dT'a? = PQG > Xo) 
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Where X1is the score for a positive instance and Xo is the score for a negative 
instance, and foand f; are probability densities. 


Description Receiver Operating Characteristic curves are utilised to assess the efficacy of a 
classification process. This curve is the graphical plot between FPR in the x-axis 
and TPR in the y-axis, respectively. In order to define the performance of a 
Receiver Operating Characteristic curve in a numerical value, AUC is calculated. 
AUC is defined as the probability of a classifier to rank a randomly selected 
positive event higher than a randomly selected negative event. 


3.3.4 Identified solutions 


In [127] a comprehensive survey related to IDS systems for the smart grid is presented. In [128], A. 
Patel et al. proposed an anomaly-based IDS relying on an SVM, an OKB and a fuzzy analyser. In 
particular, this IDS can monitor the entire electrical grid ecosystem and consists of numerous HIDS and 
NIDS agents that each of them applies an SVM model which was trained by combining records from 
the KDD CUP 1999 dataset and experiments carried out by the authors. Moreover, in order to reduce 
the false positives generated by the previous SVM model, a fuzzy logic technique was adopted capable 
of determining a risk value between 0 and 1 for each entity of the electrical grid. Finally, an OKB was 
used to identify the target of the possible attacks. Based on the evaluation process, the AUC reaches 
0.994. 


Y. Zhang et al. [129] developed and IDS for the electrical grid that can monitor and control the network 
traffic exchanged between Home Area Networks, Neighbour Area Networks and Wide Area Networks 
in a hierarchical manner. Specifically, the proposed IDS consists of multiple units devoted to monitoring 
each of the aforementioned networks. Each IDS unit applies the AIRS2Parallel and CLONALG algorithms 
that were trained with the NSL-KDD dataset. According to the evaluation analysis, the accuracy of 
AIRS2Parallel and CLONALG is calculated at 98.796 and 99.796 respectively. 


In [130], the authors presented an IDS for the AMI consisting of three units that monitor the network 
traffic generated by smart meters, data collectors and the AMI headend respectively. Concerning the 
detection process, the algorithm evaluates seven machine learning algorithms by using both KDD CUP 
1993 and NSL-KDD datasets. The algorithms evaluated are: 1) Single Classifier Drift, 2) Bagging using 
Adaptive-Size Hoeffding Tree, 3) Bagging using ADWIN, 4) Limited Attribute Classifier, 5) Leveraging 
Bagging, 6) Active Classifier, 7) Accuracy Updated Ensemble. Based on the experimental results, the 
Single Classifier Drift and the Active Classifier are suggested for the smart meters, the Leveraging 
Baggin for the data collectors while the Active Classifier for the AMI headends. 


In [131], the authors developed an anomaly-based IDS for AMI, which monitors and controls the 
bidirectional Transmission Control Protocol/Internet Protocol (TCP/IP) network flows, which are 
aggregated periodically in the data collector component. The proposed IDS consists of four modules, 
namely 1) the Network Monitoring Module, 2) the Network Flow Extraction Module, 3) the Analysis 
Engine Module and 4) the Response Module. Regarding the detection process implemented by the 
Analysis Engine Module, a Classification And Regression Tree decision tree was deployed by utilising 
the CICIDS2017 dataset. Based on the evaluation analysis, the accuracy and the True Positive Rate of 
the proposed IDS reach 0.996 and 0.993 respectively. 


T. Morris et al. in [132] focus their attention on the Modbus (over serial line) and Modbus TCP/IP 
protocols, by providing 50 relevant signature rules. Modbus is an industrial protocol for the 
communication of SCADA systems released by Gould Modicon (now Schneider Electric) in 1979. In 
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particular, each rule provided by this paper has been determined in a specific field by using the Snort 
IDS syntax. It is noteworthy that the authors do not provide numerical evaluation results. 


In [133], B. Kang et al. implemented a signature-based IDS for IEC 61850 substations by using the 
Suricata IDS. More detailed, a stateful analysis plugin was implemented into Suricata, whose 
architecture is divided into three units, namely 1) Manufacturing Message Specification decoder, 2) 
rule match engine and 3) state manager. The first unit decodes the MMS packets by extracting their 
attributes. The second unit applies the signature rules, while the role of the last unit is to update the 
state of the protected devices. Concerning the evaluation process, two cyberattacks were performed 
and detected successfully. 


In [134], Y. Yang et al. implemented a specification-based IDS devoted to protecting synchrophasor 
systems utilising IEEE C37.118. In particular, their IDS is composed of 1) access control rules, 2) protocol 
rules and 3) behaviour rules. The access control rules determine the legitimate Medium Access Control 
(MAC) and the Internet Protocol (IP) addresses as well as the corresponding transport layer ports 
permitted to transmit and receive network packets. The protocol rules define that only IEEE C37.118 
network packets can be transmitted by the various entities. Finally, the last category adopts a deep 
packet inspection process, thus defining behaviour rules based on the attributes of IEEE C37.118. 
Concerning the evaluation process, the False Positive Rate is calculated approximately at 0%. 


The work in [135] proposed an intrusion detection method for AMI, which is mainly based on the OS- 
ELM technique. OS-ELM is a special feedforward neural network model which utilises the online 
sequence learning for its training process. In more detail, the scheme’s methodology consists of three 
basic phases: a) data pre-processing phase, b) initialisation phase and c) online sequence learning 
phase. During the first phase, the training data is pre-processed by using the Gain Ratio Evaluation 
feature selection method. In the second phase, the parameters for the training process of the neural 
network are initialised randomly, while the third phase is about the training process itself. The training 
process utilised the dataset that can be found on the website [136]. Nevertheless, the specific dataset 
does not include network records that identify cyberattacks nor abnormal behaviour patterns. 
Furthermore, multiple experiments were conducted during the evaluation process to determine the 
appropriate parameters for the presented model. In addition, other classification algorithms were 
used for the model evaluation as well. It was stated that the proposed solution overtakes the other 
algorithms and Accuracy approaches 97.239%. Accordingly, FPR and FNR are calculated at 5.897 and 
3.614, respectively. 


Chen et al. [137] present an anomaly-based intrusion detection method which is focused on the false 
data injection attacks. The proposed scheme is based on a spatiotemporal evaluation, able to control 
the correlations between the state estimations of AMI. State estimations refer to actions like energy 
supply/demand and electricity pricing. The presented method is divided into two phases. The first 
phase involves the creation of a set of state estimations, which is characterised by spatial correlations 
and temporal consistencies. The second phase includes the employment of a voting system which 
classifies each state estimation into three categories: a) good, b) abnormal and c) unknown. Two false 
data injection attacks were simulated in order to evaluate the current scheme. The first attack focused 
on maximising the energy transmission costs, in contrast to the second attack that intended to cause 
a power outage. Regarding the first attack, it was noticed that the proposed method does not generate 
any False Positive. On the other hand, the second attack results in 0.43% FPR. 


The work in [138] presented an IDS which exclusively focuses on blackhole attacks. Blackhole attacks 
constitute Denial of Service attacks which aim to drop all network packets by advertising malicious 
nodes or malicious paths. In more detail, the proposed system enables control over the 
communications of an AMI NAN. The Network Simulator 2 was utilised in order to deploy the specific 
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kind of attack, while the Ad-Hoc On-Demand Distance Vector protocol was also employed to examine 
the AMI network as an ad-hoc network. The simulation included 100 smart meters nodes, 1 data 
collector and 2 malicious nodes. In the simulation environment, the IDS can be considered as a 
different node that communicates only with the data collector node. The Naive Bayes Classifier, which 
is based on the Bayes theorem, was applied to detect the possible black hole attacks. As input in the 
classifier the following features were used a) the number of route request packets, b) the number of 
route reply packets and c) the number of dropped packets. Regarding the performance evaluation of 
the current IDS, the Waikato Environment for Knowledge Analysis software was used. The authors 
claim that their system recorded 100% TPR, 99% Accuracy, 66% Precision and AUC approaches 1. 


An intrusion detection framework for AMI involving the anomaly detection technique was also 
presented by Ullah and H. Mahmoud in [139]. The proposed model is based on individual IDS modules 
that are placed in different locations in Home Area Networks, Near Area Networks and Wide Area 
Networks correspondingly. The basic idea involves the notification of the system administrator of AMI 
if a possible threat is detected by an IDS module. A central IDS module is also present aggregating and 
examining the alarms generated by the various IDS modules. The WEKA software in cooperation with 
the ISCX2012 dataset was employed in order to evaluate a plethora of machine learning classification 
algorithms. Various network attacks were enlisted in the specific dataset falling into four categories: 
DoS, LAN to LAN (L2L), Secure Shell (SSH) and Botnet. The authors evaluated 20 algorithms of which 
the most efficient are: J48, JRip, BayesNet, SVM and MLP. The most efficient algorithm was J48, which 
achieved 99.70% Precision and 99.60% TPR. 


The clustering technique was utilised in [140] to implement a distributed IDS for AMI. The architectural 
components of the proposed system include multiple IDS units that are installed on the data collectors 
and the AMI headend. As a first step, the network traffic between the data collectors and smart meters 
is analysed and monitored by the IDS units of the data collectors. As a result, the detection of the 
potential abnormal takes place and a summary report is sent to the IDS unit of the AMI headend. 
Following that, the AMI headend investigates further the specific anomalies. The Mini-Batch K-Means 
algorithm is utilised in cooperation with a sliding window technique for the detection process. A new 
dataset consisting of the TCP/IP network features was developed by the authors regarding the training 
procedure of the Mini-Batch K-Means clustering algorithm. The Principal Component Analysis (PCA) 
technique was employed in order to reduce the dimensionality of the dataset. The choice of clusters 
(k) was specified at 4 in number since the specific value achieved the best silhouette score and FPR. 
The authors simulated three attack scenarios towards their model evaluation: a) TCP SYN Flooding DoS 
attacks, b) stealth port scanning attacks and c) a combination of the previous ones. 


A deep learning-based Intrusion detection system approach for the AMI was also introduced in [141]. 
The proposed scheme promotes two lines of defence. The first line includes the HIDS. The HIDS is 
deployed at smart meters and the AMI backend server aiming to protect the firmware, the operating 
system and the network interfaces of these devices. The second line of defence involves the Network 
Intrusion Detection System, which performs sniffing and inspection of the AMI network traffic while 
providing a broader examination of the entire network. The proposed classifier was trained and tested 
by utilizing the NSL KDD dataset, which includes 41 features. The accuracy of each event, to be 
classified, depended upon the number of hidden layers, number of nodes and the activation function. 
Finally, it was proven via an experimental study that the proposed scheme outperforms the Random 
Forest, SVM, and Naive Bayes based IDS approach in terms of detection accuracy. 


The authors in [142] presented a new hierarchical and distributed intrusion detection system for the 
AMI against false data injection attacks. The proposed system was based on distributed a Fog 
architecture using three hierarchical network levels including a) the AMI network layer containing 
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different smart grid users’ types, b) the Fog network layer involving a decentralized fog data centre for 
each microgrid, and c) the Cloud network layer involving a centralised operation centre for all Smart 
Grid operations. The behaviour of smart meter data measurements was studied using a stochastic 
modelling based on Markov chain. The transactions are represented between five basic states 
according to the filtering thresholds: authentic, suspicious max, suspicious min, malicious max and 
malicious min. The advantage of the proposed solution was proved over different performance metrics 
and smart grid conditions. 


The work in [143] proposes a hybrid approach to detect anomalies associated with electricity theft in 
the AMI system. The proposed scheme is based on a combination of two robust machine learning 
algorithms; K-means and DNN. K-means is employed to identify groups of customers with similar 
electricity consumption patterns towards understanding different types of normal behaviour. On the 
other hand, the DNN algorithm is used to build an accurate anomaly detection model in order to 
discover changes or anomalies in usage behaviour. The current algorithm is also able to decide whether 
the customer has a normal or malicious consumption behaviour. Regarding the evaluation of the 
current model, a real dataset from the Irish Smart Energy Trials was utilised. The results show a high 
performance of the proposed model compared to the models mentioned in the literature. 


In [144] P. Manso et al. presented an SDN-based IDS capable of detecting and addressing DDoS attacks. 
Their implementation belongs to the signature-based IDS family and utilises SDN in order to prevent 
and mitigate the various DDoS attacks. The detection part is based on the signature rules of Snort. On 
the other hand, the SDN architecture consists of the Mininet simulator and the Ryu controller. Mininet 
simulates a plethora of SDN-enabled switches and hosts, while Ryu controls the SDN-enabled switches. 
When an attack is detected by Snort, a corresponding signal is transmitted to the Ryu controller which 
then rearranges the network flows, utilising the respective OpenFlow commands. To evaluate their 
IDS, two attack scenarios were emulated and detected successfully, while the mitigation time is also 
calculated under different conditions. 


In [145] O. Igbe et al. presented an anomaly-based IDS devoted to protecting the DNP3 
communications. The proposed IDS is composed of three main blocks, namely packet capture block, 
pre-processing block and dDCA signal processing block. The first one is responsible for capturing the 
DNP3 network packets, the second one pre-processes the data coming from the first block, while the 
third one undertakes the detection process by implementing the deterministic Dendritic Cell Algorithm 
(dDCA). To evaluate the performance of their IDS, the authors create an artificial dataset composed of 
Min-in-The-Middle (MiTM) attacks, DNP3 Packet Modification and Injection Attacks, DNP3 Disable 
Unsolicited Messages Attacks, DNP3 Cold Restart Message Attacks and Distributed Denial of Service 
Attacks. Moreover, the authors indicate many features that can be used to detect anomalies 
concerning the DNP3 communications. Finally, Receiver Operating Characteristics (ROC) curves are 
used to assess the efficacy of the proposed IDS. 


Table 15: Summary of ID Systems in EPESs 


Literature Target Detection Protocols Attacks Performance 
Work System Technique 
A. Patelet al. | Entire SG Anomaly- Not provided e Dos Attacks AUC = 0.99451 
[128] ecosystem based e Packet splitting 


e Command insertion 
e  Shellcode mutation 
e Brute force attacks 
e Payload mutation 

e Duplicate Insertion 
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Y. Zhang et al. | Entire SG Anomaly- Not provided e Dos Attacks ° Sa ae SE 
[129] ecosystem based e U2R aie « — AlRS2Parallel 
: i di ae fre oh DURS, 
98.7%] 
M. A. Faisal AMI Anomaly- Not provided e Dos Attacks i Pagi RW 
et al. [130] based e  U2R Attacks PH 
e FPR of Active 
ORES Classifier = 3.31% 
o Prang Atede e Accuracy of Single 
Classifier Drift = 
97.74% 
e FPR of Single 
Classifier= 0.78% 
e Accuracy of 
Leveraging Bagging 
= 98.33% 
e FPR of Leveraging 
Bagging = 1.07% 
P. I. Radoglou | AMI Anomaly- Transmission e Brute force attacks e „Aceüracy =0:396 
et al. [131] based Control e Dos attacks WE es 
/Decision Protocol/ e Web attacks 
tree Internet e ` Infiltration attacks 
Protocol e Port scanning 
(TCP/IP) e  Botnets 
T.H. Morris et | SCADA Signature- Modbus Not provided Not provided 
al. [132] based 
B. Kang etal. | Substation Signature- MMS/ IEC 61850 | Active power limitation Two examples that were 
[133] based attacks detected 
Y. Yang et al. SCADA Specification- | IEC-104 e Packet injection ; SE E mug 
[134] based attacks e TPR=100% 
e Replay attacks 
Data manipulation S TARL 
5 e  FPR=0% 
H FNR=0% 
Y. Liet al. AMI Anomaly- Not provided Not provided e  Accuracy- 97.329% 
[135] based H FPR=5.897% 
H FNR=3.614% 
P. Y. Chen AMI Anomaly- Not provided False data injections e FPR ofthe first 
[137] based attacks attack 2096 
e FPR of the second 
attack = 0.43% 
N. AMI Anomaly- AODV Black hole attacks e  TPR-10096 
Boumkheld et based e — Accuracy-9996 
al. [138] e  Precision=66% 
e AUC-1 
I. Ullah and H. | AMI Anomaly- Not provided e  Dosattacks dida tale 
Mahmoud based e  L2Lattacks EXC o 
[139] e Secure shell attacks 
e Botnet 
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F. A. A. AMI Anomaly- Not provided e  Dosattacks ROC curves 
Alseiari and Z. based e Port scanning 
Aung [140] 
Z.ElMrabet | AMI Deep- o TE aye Desne Accuracy = 99.5% 
et al. [141] learning- e UDP CE MAN UO 
based fay e  RL2 Attacks 
e e 
e Probing Attacks 
DA AMI Distributed- Not provided False data injection Presented in figures 
Chekired et fog-based attacks the following: 
al. [142] . | 
e Intrusion detection 
rate 
e Communication 
overhead 
e Computation time 
e Energy 
measurement 
H Price 
A. Maamar et | AMI Machine- Not provided Electricity theft e FPR=8.86% 
al. [143] learning- e. Detection rate = 
based 95.38% 
P. Manso et loT SDN-based e  Openflow Dos attacks e ` Dos Mitigation Time 
al. [144] e UDP = 3.07 seconds 
e Average Round Trip 
Time = 0.541 ms 
e Packet loss = 0 % 
e Dos attacks 
O. Igbe et al. SCADA Anomaly- e  DNP3 : ROC Curves 
[145] bad e TCP/IP e  MITM attacks 


In order to make an evaluation analysis and calculate the aforementioned KPIs concerning the 
detection processes, usually, artificial cyberattacks and anomalies are emulated. Also, publicly 
available intrusion/anomaly detection datasets can be used for this scope. In [127], the authors 
provide a comprehensive analysis concerning various IDS systems for the smart grid, describing also 
the artificial cyberattacks and the datasets used for the evaluation process. Characteristics 
cyberattacks relevant to the smart grid that can be used are DoS attacks, MiTM attacks, brute force 
attacks, reconnaissance attacks, false data injection attacks, unauthorised access, traffic analysis 
attacks, infiltrations, botnets, etc. Moreover, in [146], the authors provide a detailed analysis about 
the available intrusion detection datasets. 


3.4 Respond 


This function as specified by NIST focuses on the development and implementation of activities 
relevant with the response on occurring cybersecurity incidents, also containing their impact. 


3.4.1 Background on the function 


The NIST framework identifies the following categories of cybersecurity solutions which are relevant 
to the Respond function: 


e Response Planning: Response processes and procedures are executed and maintained, to ensure 
response to detected cybersecurity incidents. 

e Communications: Response activities are coordinated with internal and external stakeholders 
(e.g. external support from law enforcement agencies). 

e Analysis: Analysis is conducted to ensure effective response and support recovery activities. 
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e Mitigation: Activities are performed to prevent expansion of an event, mitigate its effects, and 
resolve the incident. 

e Improvements: Organizational response activities are improved by incorporating lessons learned 
from current and previous detection/response activities. 


3.4.2 Theoretical background 


Incident response methodologies typically emphasize preparation—not only establishing an incident 
response capability so that the organization is ready to respond to incidents, but also preventing 
incidents by ensuring that systems, networks, and applications are sufficiently secure. Response 
processes and procedures are executed and maintained, to ensure the response to detected 
cybersecurity incidents. Networks, systems and applications should be monitored, through reviewing 
log entries and security alerts. As handlers become more familiar with the logs and alerts, they should 
be able to focus on unexplained entries, which are usually more important to investigate. Conducting 
frequent log reviews should keep the knowledge fresh, and the analyst should be able to notice trends 
and changes over time. The reviews also give the analyst an indication of the reliability of each source. 
Ensuring Response Planning process is executed during and after an incident. In a case intended to 
cause serious damage to the power system, the attacker will follow a certain procedure after the 
malware infection is accomplished. This will include a communication to the attacking server, 
distorting data and taking over internal authority. This means that the key to containing the damage 
caused by a cyberattack lies in discovering an attack at an early stage and adopting a rapid response 
to the incident. An effective Response Plan needs to guide utilities personnel at all levels in managing 
a potential data breach in a way that supports rapid and thoughtful response activities. Key elements 
might be differentiation of breaches, creation of an action item checklist, review and update the 
response plan regularly [147]. 


Communication is key when responding to Cybersecurity incident. An organization should have 
multiple (separate and different) communication and coordination mechanisms in case of failure of 
one mechanism. Such communication mechanisms are described in [1] and include: 


e Contact information for team members and others within and outside the organization (primary 
and backup contacts), such as law enforcement and other incident response teams; information 
may include phone numbers, email addresses, public encryption keys (in accordance with the 
encryption software described below), and instructions for verifying the contact’s identity. 

e On-call information for other teams within the organization, including escalation information. 

e Incident reporting mechanisms, such as phone numbers, email addresses, online forms, and secure 
instant messaging systems that users can use to report suspected incidents; at least one 
mechanism should permit people to report incidents anonymously. 

e Issue tracking system for tracking incident information. 

e Smartphones to be carried by team members for off-hour support and onsite communications. 

e Encryption software to be used for communications among team members, within the 
organization and with external parties. 

e Warroom for central communication and coordination; if a permanent war room is not necessary 
or practical, the team should create a procedure for procuring a temporary war room when 
needed. 

e Secure storage facility for securing evidence and other sensitive materials. 

e Evidence gathering accessories, including hard-bound notebooks, digital cameras, audio recorders, 
chain of custody forms, evidence storage bags and tags, and evidence tape, to preserve evidence 
for possible legal actions. 
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Analysis of the cyber security incident should consider whether technical capability gaps contributed 
to the attacker’s success or whether people or process gaps were the main culprit. Using an application 
or a database, such as an issue tracking system, helps ensure the analysis of the cyber-attack and 
resolved in a timely manner. The National Institute of Standards and Technology [148] has developed 
Recommendations on Security Incident Handling and provides recommendations for what information 
should be collected for each incident [1]. The issue tracking system should contain information on the 
following: 


e Characterization of the incident phase: new, in progress, under investigation, settled. 
e Incident information i.e. brief description, indicators, other incidents. 

e Detailed sequence of actions taken by people involved in the incident. 

e Impact assessments related to the incident. 

e Contact information of involved parties, evidence gathered and respective comments. 
e Following steps. 


Mitigation activities are performed to prevent the expansion of an event and to resolve the incident. 
A DDoS attack against a SCADA network can be targeted at any node in its tree structure, i.e., either at 
a smart meter, RTU, relay port. Honeypots are designed and used to lure and be attacked by hackers. 
They can collect evidence and help hide the real servers. By embedding honeypots into the real servers, 
the real servers can serve as an internal network on the honeypots’ network port mapping, which can 
increase the safety ratio of the real servers. Regarding data distortion i.e. load forecasting, response 
measures to suspicious subsequences may include replacement with data points from historical data 
on a similar day once those sequences are identified. Historical data on a similar day are certainly 
different from the real data but are not expected to very different. Therefore, replacement of 
identified abnormal data will not have significant adverse impacts on the forecasting results. Another 
major response is straightforward, i.e., using alternative forecasting models if it is determined that the 
cyberattack causes a corruption of the forecasting model [149]. Regarding mitigation of Cyberattacks, 
some materials are needed, such as: 


e a computer, loaded with appropriate software (e.g., packet sniffers, digital forensics). This 
computer should be scrubbed, and all software reinstalled before it is used for another incident. 
Note that because this computer is for special purpose, it is likely to use software other than the 
standard enterprise tools and configurations, and whenever possible the incident handlers should 
be allowed to specify basic technical requirements for these special purpose investigative 
computers. In addition to an investigative computer, each incident handler should also have a 
standard computer, smart phone, or other computing device for writing reports, reading email, 
and performing other duties unrelated to the hands-on incident analysis. 

e backup devices, blank media, and the necessary networking equipment. 


The organization implements Improvements by incorporating lessons learned from current and 
previous detection / response activities. Following a cybersecurity incident, it is important to update 
all cyber security incident response approaches, controls and related procedures. This is commonly 
done by performing trend analysis to help: (i) evaluate patterns and trends, (ii) identify common factors 
(iii) determine the effectiveness of controls and (iv) evaluate costs and impact of the cyber security 
events. [104]. One of the most important parts of incident response is the improvement analysis. This 
should evolve to reflect new threats, improved technology, and lessons learned. Based on the 
improvement planning explained in [1], a detailed improvement plan should give answers to the 
following questions. 


e What is (are) the exact event(s) and when did it (they) happen? 
e What was the reaction of the personnel, did they follow procedures, were they effective? 
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e What could be done before the event(s) that may have helped? 

e What could be performed differently by the personnel in a future similar situation? 
e Any possible ways to improve information sharing with other organizations? 

e What corrective actions may contribute to future occurrence? 

e Are there any indicators to be monitored that detect similar incidents? 


3.4.3 Key performance indicators 


Incidents can occur in countless ways, so it is infeasible to develop a specific set of KPIs to respond to 
every incident. Organizations should be generally prepared to handle any incident but should focus on 
being prepared to handle incidents that use common attack vectors. Different types of incidents merit 
different response strategies. The incident response team is a critical component for the Information 
Security Management System (ISMS), which operates as an information repository in order to simplify 
and accelerate the mitigation of security incidents. Several recognized publications, such as [150] 
describe the importance of implementing procedures and controls for incident management. Tracking 
security measures and business outcomes may provide meaningful insight as to how changes in 
granular security controls affect the completion of organizational objectives [1]. Based on the critical 
information required to make fact-based decisions the following KPIs are defined [151]: 


Table 16: Key performance indicators for response. 


KPI Possible Measurements 


Number of events per 
service or application 


: Number of events / services 
- Number of events / applications 


Number of events per 
account 


- Number of events / accounts 


- Number of events / users 


Number of devices 
being monitored 


: Number of devices 
- Number of devices / analysts 


Total number of 
events 


: Number of events / hour / analyst 

- Number of events / day / analyst 

- Number of events / month / analyst 
- Number of events / year / analyst 

- Number of events / event type 


Number of events per 
device or host 


- Number of events per device or host / day 

- Number of events per device or host / month 
- Number of events per device or host / year 

- Number of events / device or host / type 


- Number of events / operating system type 


Number of events per 
location 


- Number of events / departments 
- Number of events / offices 
- Number of events / regions 


Number of false 
positive alerts 


- Number of false positives / hours 
- Number of false positives / days 
- Number of false positives / months 
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- Number of false positives / years 
: Percentage of events that are false positives 


Time to respond - Measured in minutes, hours or days. 

: Average time to respond 

- Average time to respond / technology 

- Average time to respond / event type Outliers 


Number of analysts - Average number of analysts / event 

assigned - Average number of analysts / event type 

: Average number of analysts / level / event 

- Average number of analysts / level / event type 


Escalation level - Average number of events / level 

- Average number of events / level / time period 
- Escalation level / event type 

- Escalation level / technology 

: Average time (in min or hours) to escalate 


Event source - Total number of events / technology 
- Total number of events / technology / (time period) 
- Total number of false positives / technologies 


3.4.4 Identified solutions 


The latest European Commission Recommendation on cybersecurity in the energy sector (C(2019) 
2400 final of the 3.4.2019), sets the basis for the actions mandates to the energy operators regarding 
Real-Time Requirements Of Energy Infrastructure Components, Cascading Effects, Legacy And State- 
Of-The-Art Technology, as well as specify a clear time-plan for the monitoring and review of the 
Recommendation in the Member States National Regulation. As far as responding to cyber-attacks on 
the energy sector, the recommendation declared that "there should be structured communication 
channels and agreed formats in place in order to share sensitive information with all relevant 
stakeholders, Computer Security Incident Response Teams, and relevant authorities". Additionally, it 
identifies specific guidelines regarding preparedness measures for cascading effects in interconnected 
electricity and gas networks and requires communication and control networks to be designed “with 
a view to confining the effects of any physical and logical failures to limited parts of the networks and 
to ensuring adequate and swift mitigation measures". This will be practically implemented by the 
formulation of "tenders with cybersecurity in mind, that is to say demand information about security 
features, demand compliance with existing cybersecurity standards, ensure continuous alerting, 
patching and mitigation proposals if vulnerabilities are discovered, and clarify vendor liability in the 
event of cyber-attacks or incidents". 


All these clearly stated mandates for the Member States put the cybersecurity at the top of the agenda, 
especially requesting for specific response and mitigation plans in all forthcoming tenders for 
communication and control systems. In [152], a comprehensive cybersecurity application is presented 
providing a smart grid security testbed, including the set of control, communication, and physical 
system components simulating an accurate cyber-physical environment. Availability and integrity 
attacks are simulated in Hardware-In-The-Loop configuration with both isolated and coordinated 
approaches, these attacks are then evaluated and mitigated based on the physical system's voltage 
and rotor angle stability. In [153], authors from JPL USA apply systems engineering and fault 
management concepts to a cyber-physical scenario of a smart metering system. Building on their 
previous work of inter-system interactions of a metering network with the power system during a load- 
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drop attack, they apply new fault management concepts to expand that analysis for characterizing the 
range of cyber-attack patterns, and to prescribe detection and response techniques that reduce the 
consequence of such an attack. 


In [154], the authors deal with a complex network with phasor measurement units (PMUs) for 
collecting real time data that increase smart grid observability. They propose a risk mitigation model 
for optimal response to cyber-attacks for PMU network by a mixed linear programming (MILP) to 
prevent the propagation of the cyber-attacks and maintain the observability of the power network. In 
[155], the authors identify trends and recent results on system response and reconfiguration under 
cyber-attacks, basically categorizing them in two types: i) preventive, which identifies the 
vulnerabilities and modifies either control parameters or the redundancy of devices to increase cyber- 
resilience, ii) reactive, which responds as soon as the attack is detected with specific plans i.e. 
modifying the non-compromised controller actions. In [156], the authors give a general presentation 
on security mechanisms for substation level SCADA communication which has a Bump-in-the-wire 
(Bitw) device and propose a security solution to respond to cyber-attacks by integrating CDAC's key 
distribution and management protocol Sec-KeyD into IEC 62351 to secure IEC 61850 protocol. 
Furthermore, in [157], the authors focus on cyber-security attacks targeting EV infrastructure. Their 
response model isolates a subset of compromised and likely compromised EV supply equipment and 
minimizes the risk of attack propagation, while ensuring equipment availability to supply EV demand. 


In [158], the authors focus on a medium access control (MAC) layer intrusion detection and response 
system (IDRS) for wireless networks in smart grids, based on the perception of defence-in-depth. 
Additionally, in [159], the authors promote the use of an agent-based decentralized protection system 
using peer-to-peer communications, reputation-based trust and a data retransmission scheme to 
combat malicious attacks and other “Byzantine” failures. “Byzantine” is considered in the sense that 
an intelligent device, such as an IED, can inconsistently appear both failed and functioning to failure- 
detection systems, presenting different symptoms to different observers. Thus, the electric power and 
communication synchronizing simulator (EPOCHS) federated simulation platform is used to provide a 
special protection system and response system in the face of a cyber intruder by successfully defending 
against malicious attacks. In [160], Software-Defined Networks (SDN) and Network Function 
Virtualization are proposed to facilitate incident response to a variety of cyber-attacks against 
industrial energy networks. A Prototype of an Incident-Response Solution that detects and responds 
automatically cyber-attacks targeting sensors and controllers. 


In [161], the authors focus on cyber-attacks affecting geographically dispersed DGs, which are 
generally aggregated into a virtual power plant (VPP). Distributed control schemes are used to achieve 
optimal economic dispatch of the VPP, which are susceptible to communication failures and cyber- 
attacks, such as non-colluding and colluding attacks. An attack-robust distributed economic dispatch 
strategy is proposed where every DG monitors the behaviour of its in-neighbours, obtains the network 
connectivity information, detects the misbehaving DGs residing in the network and responds by 
isolating them so that the remaining well-behaving DGs could still accomplish the economic dispatch. 
Finally, in [162], Denial of Service (DoS) attacks targeting electric power utilities are investigated. As a 
response countermeasure, authors propose and test the enabling of cyber elements to reconfigure the 
system's routing topology, in a distributed manner, so that malicious nodes are isolated. A 
collaborative reputation-based topology configuration scheme is proposed and through game 
theoretic principles authors prove that a low-latency Nash Equilibrium routing topology always exists 
for the system: the remaining nodes converge quickly to an equilibrium topology and maintain 
dynamical stability in the specific instance of an islanded microgrid system. 


© SDN-microSENSE consortium Page | 51 
Public document 


(x) SDN-uSense 
D2.1 


Version 1.0 


3.5 Recover 


This function as specified by NIST focuses on the development and implementation of suitable 
activities and plans for resilience and timely restoration of capabilities impaired by cyber security 
incidents. 


3.5.1 Background on the function 


The NIST framework identifies the following categories of cybersecurity solutions which are relevant 
to the Recover function: 


e Recovery Planning: Recovery processes and procedures are executed and maintained to ensure 
restoration of systems or assets affected by cybersecurity incidents. 

e Improvements: Recovery planning and processes are improved by incorporating lessons learned 
into future activities. 

e Communications: Restoration activities are coordinated with internal and external parties (e.g. 
coordinating centres, Internet Service Providers, owners of attacking systems, victims, other 
CSIRTs, and vendors). 


3.5.2 Theoretical background 


Protection and prevention techniques are the first line of protection of any organization, but it cannot 

fully protect its business from all cyberattacks. While it is preferable to avoid a cyberattack some of 

them simply cannot be stopped. Therefore, recovering and getting back to normal business, unscathed 

as much as possible, is a key responsibility for cybersecurity experts. This action is performed through 

the definition of a recovery plan, which aims to retrieve and restore systems, data and functionality 

that were a target of a cyberattack. Due to the characteristics of the solutions nowadays it requires to 

cover both technology and people. More specifically, recovery planning is commonly defined as: 

e Goal: protect service, business and data assets after a cybersecurity attack 

e Initial plan: design an approach for collecting evidence and preserve it in a safe way 

e Reaction: design and implement techniques for preventing future losses 

e Management of the planning: to have a dedicated team that is up-to-date with new cybersecurity 
attacks and adapt to the needs of the organization and cyberthreat landscape 


One of the first activities to perform in an organization for creating a recovery plan is to educate and 
train the cybersecurity staff to respond to breaches, denial of service attacks and others type of attacks 
so they can react faster while the system is vulnerable. At the same time, it is important to improve 
and refine the existing cybersecurity solutions of the organization in order to assemble a complete 
cyber monitoring program with a high level of interaction. Additionally, it is important that the team 
works around a common cybersecurity framework so they can evaluate and validate, in real time, the 
resilience of the digital systems and the strengths of its defences. 


The cybersecurity team must be able to gather evidence, preserve and analyse data for forensic 
investigation. This should cover both known and unknown cyberattacks. Known ones should have 
defined a specific action plan for recovery and remediation while mitigation and reaction plans should 
be also defined for unknown or unforeseen scenarios. 


The output of this investigation is used for developing a plan for short-term remediation actions so 
critical business operations can resume as soon as possible and a long-term risk mitigation plan based 
on lessons learned from the investigation. The plan should cover, among others, the following points: 
e Define a management structure for roles and responsibilities 
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e Define together with the response plan, a business plan (how the business of the organization 
could continue after the attack with the maximum availability) and a crisis management strategy 

e Determine communication channels, both internal and external, in case of a cyberattack 

e Identify and introduce alternatives for providing current services and storage for data 

e Develop and manage “what-if” scenarios using cyberattacks that have targeted organizations in 
your domain of work (e.g. what happens if | am the victim of a ransomware attack?) 

e Manage and fix issues and inconsistencies before incidents happen 

e Include in the plan the impact a cyberattack would have in the legal and financial areas 


Recovery plans are of dynamic nature, which means should be regularly refined and updated in order 
to cover new cyberattacks and threats. Also, and as mentioned before, the plan must cover both 
technical and human areas, which is another reason of why the planning must be continuously under 
revision due to the requirements and constraints that appear in the day-to-day of an organization. 
Also, the plan must evolve using the feedback from malicious events that happened in the system or 
attacks that affects similar business or area of domain. 


A usual strategy for always having up-to-date recovery plans is to design a specific team that 
periodically tests, evaluates and proposes changes to them. The team should be composed not only of 
cybersecurity experts but also other experts that can cover the business part, domain of application. 
This is significantly more useful after a cyberattack happens and the team can evaluate how useful the 
plan was, improve it in order to have better results, create new mechanisms, include more roles in the 
plan or assign different responsibilities. 


A good way of measuring the recovery plan and how it can be improved is by defining recovery metrics. 

This way it is easy to define the minimum level of service, the criticality of data, the more important 

assets of the organization, and the financial impact. Among others, some of the more useful are: 

e  Time-to-patch: what is the normal time for patching an application 

e Time to incident discovery: how much time since an incident is detected to reaction 

e Time to mitigate vulnerabilities and apply recovery actions: the time for applying recovery actions 
in the organizations 

e Scope of vulnerability analysis: how many systems the vulnerability analysis covers 

e Scope of the risk assessment analysis: how many systems are covered in the risk assessment 
analysis 

e Scope of the cybersecurity testing: the number of systems covered by the cybersecurity tests and 
validation done in the organization 

e Percentage of systems without vulnerabilities: the total percentage of systems in the organization 
that do not have known vulnerabilities 

e Percentage of incidents detected: number of incidents detected in the organization 

e Percentage of cybersecurity changes identified in the system: the percentage of cybersecurity 
issues identified in the system to be fixed or updated 

e Incident rate: the rate at which incidents happen 

e Budget for cybersecurity in the organization: how much budget is spent in cybersecurity 


One of the more important phases in recovery is the process of communication, both internal and 
external, its coordination and roles involved. As we commented in the previous sub-section, the 
recovery planning must clearly design the communication plan to be performed when an organization 
is the victim of a cybersecurity attack and the responsibilities of the different roles involved. 


In order to perform a correct communication, one of the main requirements is to document all the 
possible information, procedures, and metrics. In each phase of the plan: 
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e Design clear diagrams of systems, infrastructure and communications 

e List existing assets and systems, including support agreements and external services 

e Describe dependencies of applications and criticality 

e Regulatory and legal information of the systems 

e Contact information of the members of the organization involved in the recovery team 


Regarding external communication the organization should communicate in a transparent, efficient 

and clear way about the cyberattack received, specifying the effect and repercussion it had in the 

company. More specifically information should be: 

e Transparent about what happened, including a summary about what information or services were 
affected 

e Any action required by your customers for protecting themselves 

e Understandable and empathic with the clients 

e Explain how you are going to improve cybersecurity in your organization (e.g. identity protection) 

e Provide incentives for the customers 

e Clearly describe how you are fixing the issue 


On top of that, the management team must also identify any potential threat of legal, regulatory of 
financial action in order to cover it as soon as possible. 


For the internal communication, even though the recovery plan may specify different channels for 
information about the cyberattack, some of them may not be available depending on the issue. For 
example, if the internal network has been compromised, the email or VoIP communications may not 
be secure. That is why we mentioned in the plan that it should design different scenarios and play 
them to discover how to behave in each one. Additionally, each team of the organization must know 
clearly to whom to report when having a cybersecurity incident and update according to the progress 
done. This facilitates the process of obtaining information and who can use it. 


Finally, cybersecurity information sharing is nowadays very important. When being a victim of a 
cyberattack, the organization should consider sharing information about it with other organizations or 
public authorities. Therefore, it is important to compile information about the system following specific 
methodologies and formatting so the information can be useful for as many organizations as possible. 
This information, if provided with enough time in advance, can help to improve the recovery planning 
and definition of scenarios, which greatly improves to test the usefulness of the recovery planning. 


3.5.3 Key performance indicators 


This subsection describes an initial list of key performance indicators that can be used to evaluate the 
usefulness of the recovery planning, methodologies and solutions. We have separated the KPIs in the 
three different phases of the recovery process: planning, improvements and communications. With 
respect to planning, The KPIs focus in the percentage of systems involve in the activity and the time 
required to recover the systems: 


Table 17: KPIs for recovery 


KPI Measurements 


Percentage of How many devices/systems are fully up to date? 
updated systems 


Percentage of systems | how many systems were completely recovered? 
recovered after a 
cyberattack 
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how many systems were affected by the cyberattack? 


Mean time to 
recovery 


how much time until the system is restored? 


Mean time for 
reaction 


how much time until a reaction was done in the organization for 
recovering normal functionality? 


Mean time to apply 


how much time took to apply the current recovery plan? 


methodology for 
recovery planning 


Furthermore, the KPIs of the improvement phase refer to the enhancements performed in the 
recovery plan, as: 


Table 18: KPls for improvement 


KPI Measurements 


difference of time required to apply the improvement plan 
between the mean time and the current one 


Percentage of time 
used 


difference of data successfully recovered from the mean time to the 
actual one 


Percentage of data 
recovered 


difference between data lost in the meantime compared with the 
actual one 


Percentage of data 
loss 


Finally, the KPIs for the communication phase focus in reached customers, impact and channels: 
Table 19: KPIs for communication 


KPI Measurements 


the channels used for communicating with external 
entities/stakeholders 


External number of 
channels used for 
communication 


External customers 
reached 


number of customers reached using external communication tools 


Number of tools for number of different tools used for external communication 
external 


communication 


Impact in external number of customers affected/communicated 


stakeholders 


the number of messages (through any channel) exchanged with 
customers 


Number of messages 
exchanged with 
customers 


the number of necessary messages used for internal 
communication of the recovery plan 


Number of messages 
for internal 
communication 
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Number of people the number of employees involved in the recovery activities 
involved in the 


recovery planning 


3.5.4 Identified solutions 


Regarding specific technology solutions, recovery actions are usually done by managing the safety and 
integrity of the data. This can be done either using online (e.g. redundant RAID drivers) or offline (e.g. 
backup on offsite servers) back systems. One approach for recovery is the bare-metal restore solution 
[163]. This technique creates a back-up of a complete system, server or workstation, including the 
operating system, applications and data components that are necessary for running separately in a 
new hardware component. Although this hardware would require a correct configuration in order to 
work properly the advancements done in virtualization makes it much simpler to work nowadays. This 
strategy works better than the normal local disk image copy as the complete system is backed and 
facilitates greatly its deployment. 


The recovery planning is an activity that focuses both on technical solutions and human interaction. 
This task is not automatic and requires a regular evaluation and updating due to the evolving nature 
of cyberattacks together with the technical and business needs of the organization. There are no 
specific tools that can be used for creating a recovery plan. It has a series of phases that need to be 
covered according to the specific needs and requirement of each organization: inventory of assets, 
data backup, creation of redundant systems, creation of a communication plan. Therefore, in this 
phase we include existing tools that can be used for asset inventory, data back-up and creation of 
redundant systems. Some existing solutions for asset inventory are: 


SolarWinds N-central [164]: This solution provides tool for managing devices in a complex 
environment. It provides user experience functionalities such as drag-and-drop, reordering, define 
profiles and settings for specific devices, patch management. Additionally, it automatizes several 
functionalities such as device setup, self-healing responses, ticket creation and management, etc. 
Finally, it supports multiple types of devices such as endpoints, servers, network devices, virtual 
machines, mobile and loT devices, etc. 


Freshservice [165]: The main objective of this tool is to maintain inventory for IT and non-IT assets and 
track details throughout its lifecycle. The solution allows for asset auto-discovery, which automatically 
scans and maps all hardware and software and periodically updates the information of the assets. The 
management of the inventory helps to keep track of the assets, being contracts, hardware, software, 
etc. and evaluate their values in the system. Additionally, it provides an asset lifecycle management, 
reporting and the ability to maintain a complete repository of all the assets, with an in-depth visibility 
into how they are connected to each other and identify the impact of incidents and changes. 


SysAid [166]: This tool helps to view, secure, control and manage assets without the need of 
integration. It can analyse and manage many types of IT assets (e.g. hardware, software and other 
devices), their key attributes and relationships in a single view. SysAid automatically discovers assets 
and attributes of the assets in the network (using both an agent-based and agentless discovery option). 
Also, it provides patch management for windows-based systems and a CMDB in order to automatically 
import data and have a more complete understanding of the status of the system. 


Regarding data back-up there exist several solutions, ranging from automatized to online/offline 
capabilities and virtualization of the entire system. The backup can be done locally, using a hybrid cloud 
or direct-to-cloud. Each of them has their own advantages and issues. Following we list some 
commercial solutions that are used nowadays: 
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Acronis Data Cloud [167]: this solution is a backup and data recovery platform that follows a hybrid 
cloud strategy. Some of its key features include file sync and share, Al-based ransomware technology, 
and Office 364 backup. The solution offers protection across physical, virtual, cloud and mobile 
platforms. 


IBM Spectrum Protect Plus [168]: this solution provides data protection and availability of data by 
supporting VMs, files and databases. The solution allows for doing snapshots of specific states of the 
machines and management of data storage, instant recovery and data reuse. 


Nasuni Archive [169]: this data protection solution automatically tracks file usage and reclassify 
inactive files to reduce the cost of data management. It provides unlimited capacity without need of 
hardware upgrades, instant access to archived files and multi-cloud support. 


Rubrik Polaris Radar [170]: it is a SaaS platform that focuses on ransomware prevention, using machine 
learning to detect abnormal behaviour and recovery from attack. It also monitors all data on premises 
and in the cloud under management by the rubrik cloud data management platform. 


Furthermore, software solutions used for improvements are encompassed in a) tools for managing 
recovery planning and b) tools for measuring the KPIs and their definition. On the one hand some 
examples of tools used are pre-defined agendas for board-meetings, and standardized templates for 
internal ad-hoc reports on recovery options. On the other hand, we can find tools for recovery metrics. 


The main idea behind solutions supporting this aspect is to allow to measure the efficiency and 
completeness of the current recovery planning in order to identify which aspects need to be improved 
and how the updates or enhancements of existing functionalities provide a better result. 


Among different solutions in this aspect we can find Raygun APM [171]. This tool, among other 
functionalities, provides a MTTR specific functionality for measuring the time period between a service 
being detected as “down” to a state of being “available”. This measurement can be used for metrics 
such as availability of services, financial impact, etc. 


Finally, as we presented previously communications are supported by tools for internal and external 
communications. The internal communications tools cover the typical corporate messaging solutions 
such as email, instant messaging services, video chat, etc. Among others some of the more common 
are Gmail suite [172], Office Exchange [173], Skype for business [174], or Facebook Workplace [175]. 
Regarding tools for external communication (stakeholders, clients, third-parties, etc.) it includes email 
solutions and social networks. The first one focuses in direct communication with specific people (in 
order to have a more direct communication regarding questions and comments) while the second one 
is more generic and aims for giving a general understanding about the status. Some of the more typical 
solutions for social networking used for communication are Twitter [176], LinkedIn [177], and 
Facebook [178]. 


3.6 Additional concerns and a systemic approach for cyber security solutions 


This subsection discusses additional concerns and a systemic approach for cyber security solutions that 
must be considered for an SDN-based microgrid system. 


1. Authentication: authentication is the capability to establish the validity of a claimed identity 
[179]. An authentication mechanism verifies if the exchanged information stems from the 
legitimate participants of the SDN-enabled grid. This is because a malicious device may be able 
to inject counterfeit content or resend the same content into the SDN-enabled grid. More 
specifically, an adversarial grid application may attempt to insert new flow rules that may 
circumvent flow rules imposed by other applications [180]. Authentication can be provided 
based on three factors [181]: 
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e Knowledge factor: the proof of a knowledge of a secret (e.g., passwords). 


e Possession factor: verification of credentials provided by the possession of specialized 
hardware. 


e identity factor: Evaluation of features unique to the claimant. 


These authentication techniques can be used individually or in a combination of one or more of 
the techniques within the SDN-enabled grid. However, an authentication scheme should involve 
minimal message exchange between grid devices, because the traffic in the grid is delay- 
sensitive and very intensive. In this context, Fouda et al. [182] proposed a lightweight mutual 
authentication protocol by combining the public key encryption scheme and Diffie-Hellman key 
agreement scheme. 


2. Integrity: integrity ensures that data has not been altered or destroyed in an unauthorized 
manner [179]. It can also refer to the capability to detect if the exchanged content between the 
communicating devices of the grid has been altered or not. Within the SDN-enabled grid, 
modification of the flow rules or insertion of new rules by adversaries can cause severe damage 
to the regular operations of the grid [183]. Integrity is usually provided by appending a 
cryptographic digest of the message content to the message itself [184]. When PLCs, IEDs, 
applications and network controllers receive the message, they can check to see if the digest of 
the content matches the digest they compute on their end. If the digests match each other, then 
the message is deemed legitimate. 


There are several hashing algorithms (e.g., MD5, SHA-2, SHA-3) used for this service, which do 
not require the presence of keys unless they are specifically designed to work with keys like 
keyed-hashing (e.g., HMAC, CMAC). Integrity can also be provided as part of a digital 
authentication mechanism utilizing symmetric and asymmetric encryption techniques. Kebina 
Manandhar et al. [185] introduced the use of Kalman filters to detect various system attacks 
including false data injection. As the attacks in the power system are reflected in the form of 
voltage current or phase change, they derived the state space representation using the power 
grid voltage signal having amplitude and phase as variables. Mohammad Esmalifalak et al. [186] 
proposed two techniques for stealth attack detection based on machine-learning approaches. 
The first method employs a statistical based anomaly detection algorithm. The second approach 
employs distributed SVM to detect the stealthy false data injection. 


3. Privacy: privacy ensures the right of individuals to control or influence what information related 
to them may be collected and stored and by whom and to whom that information may be 
disclosed. The customer may even want that the individual equipment usage data should not be 
disclosed to the utility, hence only aggregate of such data is to be sent. grid communications 
must assure that the communication data preserves privacy anywhere at any time. In work 
[187], the author proposed two types of metering data in the grid network. The low-frequency 
metering data, which are the meter readings a smart meter transmits to the utility coarse 
enough (e.g., every week or month) to offer adequate privacy, and can be used for billing 
purposes or account management. The high frequency metering data are the meter readings a 
smart meter transmits to the utility often enough (e.g. every few minutes) to suggest 
information and is distinct to regional control centers for fine-grained real-time control and 
optimization. Homomorphic-encryption mechanisms can be utilized for specifically preserving 
the privacy of the flows. Lei Yang and Fengjun Li [188] proposed a mechanism to encrypt smart 
meter data using homomorphic encryption and then aggregated to conceal individual readings. 
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4. Availability: availability ensures the property of being accessible and useable upon demand by 
an authorized entity [179]. In a micro grid system, the availability of the smart meter and control 
system is crucial. These components are susceptible to a denial of service attack, and the 
legitimate user does not get services from the system. For instance, some PLCs could be 
compromised, and they could cease functioning. Moreover, recent technological advances 
enabled the integration of the wireless technologies into the grid infrastructure. In such cases, 
adversaries may jam the wireless medium, effectively hampering all the communications. Thus, 
availability service ensures that the necessary functionalities or the services provided by the 
SDN-enabled grid are always carried out, even in the case of attacks. The grid usually includes 
redundant components in their infrastructure to ensure the continuous operation during 
failures. Similarly, the SDN-enabled grid can be designed with such redundancy to achieve the 
availability service. 


5. Confidentiality: refers to the property that allows information not to be made available or 
disclosed to unauthorized individuals, entities, or processes [179]. Confidentially also entails the 
protection against any unintended information leakage from the applications, controllers, and 
devices within the SDN-enabled grid. This is particularly important because the data generated 
and collected by the grid equipment, e.g., PLCs, IEDs are very periodic in its nature. An increased 
delay for the establishment of a new flow rule in response to an incoming packet can inform a 
potential attacker about the behavior of the OpenFlow controller within the SDN-enabled grid. 
This unintended information disclosure from data plane devices, applications, flows, controllers 
should also be considered as part of any confidentiality service. Conventionally, confidentiality 
can be provided by adopting either symmetric or asymmetric key-based encryption schemes 
[184]. In symmetric encryption, one key is utilized among the PLCs, smart meters, IEDs, 
applications, flows, network controllers. Examples of symmetric encryption that can be utilized 
for the grid include AES, RC4. On the other hand, in asymmetric encryption, a pair of two keys 
(public and private) are utilized among the communicating components of the grid. RSA and ECC 
are the two most important examples of asymmetric encryption that could be deployed. 
Moreover, encryption mechanisms based on fully-homomorphic-encryption could be utilized for 
specifically preserving the privacy of the flows. IEC 62351 defines several mechanisms that can 
be used to protect the exchange of information in automation applications used in the grid. IEC 
62351-3 and 62351-5 provide provisions for confidentiality using TLS for encryption between 
devices in the network [189] [190]. These protocols also adopt HMAC as specified in IEC 9798-4. 


6. Accountability: is the property that ensures that the actions of an entity may be traced uniquely 
tothe entity [179]. With accountability, the SDN-enabled grid ensures that a device or a software 
component cannot refute the reception of a message from the other device or application or 
the transmission of a message to the other device or application in the communication. For 
instance, a digital signature scheme [191] based on utilizing encryption methods could address 
accountability. Additionally, proper auditing mechanisms and logs should be utilized to provide 
accountability in the SDN-enabled grid. Xiao et al. [192] presented a mutual inspection strategy 
to resolve the issue of non-repudiation in grid neighborhood network. The bill readings 
exchanged between them have been used to calculate the difference between theses readings. 
However, due to power loss during transmission or some dynamic factors caused by the 
environment, some inevitable difference would be there in the readings. So, a threshold value 
is computed and if the dispute does not lie within the range of this threshold value, the 
accountability is lost, and the service is terminated. 


7. Access control: access control is the prevention of unauthorized use of a resource, including the 
prevention of the use of a resource in an unauthorized manner [179]. Access control addresses 
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which participant of the grid reaches which content or service. Unauthorized use of a resource 
in the SDN-enabled grid should be prevented. An unauthenticated application might try to 
access to resources for which it does not have exclusive privileges. Or, an authenticated 
application, IEDs, PMUs, PLCs, and smart meters may abuse its privileges. Proper security 
measures should prevent any unauthorized access in the SDN-based grid. Access control is 
usually achieved using the following different methods [184]: 


e Discretionary access control (DAC): in DAC, access control decisions are made based on 
the exclusive rights that are set for the flows, applications, IEDs, PLCs, and smart meters. 
An entity in DAC can enable another entity for accessing resources. 


e Mandatory access control (MAC): in MAC, access control function considers the criticality 
of the resources and the rights of the flows, applications, IEDs, PLCs, and smart meters on 
the resources. In MAC, an entity cannot enable another entity for accessing the resources. 


e  Role-based access control (RBAC): in RBAC, access control decisions are based on the roles 
created within the SDN-enabled grid. A role can include more than one entity e.g., the 
flows, IEDs. Moreover, a role defines the capabilities what the entities can do or not do 
within a certain role. 


e  Attribute-based access control (ABAC): in ABAC, the access control decisions are based on 
the features of the flows, applications, IEDs, PLCs, and smart meters, resources to be 
accessed, and environmental conditions. 


Cheung at al. [193] proposed a role-based access control model especially devised for smart grid 
requirements known as smart-grid role-based access control. This scheme can increase the 
system reliability and prevent the potential security threats. The control center of each regional 
network is responsible for managing security policy for all inside community networks and can 
be used as an interface to communicate with outside of the regional networks. Bobba et al. 
[194] proposed a policy-based encryption scheme for access control in smart grids. The main 
element of the scheme is the key distribution center, which distributes keys and access policies 
to data senders and receivers. A receiver can decrypt information, if it has a valid set of 
attributes. 


Scalable key management: secure end-to-end communication depends on the existence of a 
secret key shared between communicating entities [195]. Thus, it is crucial to design a secure 
and scalable key management scheme to generate, distribute and update the shared 
cryptographic keys. Public Key Infrastructure is a viable solution as a key management scheme 
in the smart grid [196] [197]. 


Tamper-resistant credential protection: most field devices are deployed in remote geographic 
locations exposed to unauthorized physical access. Thus, it is important to provide protection 
against unauthorized modification and disclosure of sensitive information using digital 
certificates in these devices. An efficient solution to provide the required level of protection for 
keying materials within field devices is to use a special purpose cryptographic module, such as 
Trusted Platform Module [198]. 


The SDN-microSENSE project intends to provide multiple benefits such as increased reliability, privacy- 
enabled and resilient to cyberattacks tools, better service quality and security, and efficient utilization 
of the existing infrastructures. The progress of SDN-microSENSE development can be measured using 
a set of Key Performance Indicators [199]. The following KPIs can be used to evaluate the operation 
and control of SDN-microSENSE: 
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Technical: Technical KPls identify and quantify the benefits that a technology solution offers to 
existing assets and on the quality of service provided to customers. They are derived by gathering 
the electrical metrics on the network (e.g., voltages/currents collected along feeders and 
active/reactive power measured at the interface with the transmission system) and on customers 
and producers. Thus, it is vital to evaluate the reliability of the network operation by measuring 
the reliability indices. The operation environment between a city area DSO and a rural area DSO 
can be completely different in comparison with each other. In particular, a capability to island 
operation (microgrid) will be essential because of increased amount of production, which enables 
a microgrid operation when fault situations occur in the network. This can enhance the 
performance of a DSO in relation to reliability indices. The specific KPIs for electricity distribution 
reliability are introduced in Table 20: KPIs related to distribution reliability.Table 20. 


Table 20: KPIs related to distribution reliability. 
Key Performance Indicators 


System Average Interruption Duration Index, overall performance in city, urban and 
rural areas. Measured by considering supply criterion in different residential areas 


System Average Interruption Frequency Index, DSO’s performance level. 


Customer Average Interruption Duration Index. DSO’s performance level. 


Momentary Average Interruption Frequency Index. Overall performance in city, 
urban and rural areas. Measured by considering supply criterion in different 
residential areas. 


Amount of cabling in the DSO’s medium voltage distribution network. Cabling level. 


Share of high impedance grounded networks among DSO’s distribution lines. Level 
of compensated networks. 


Interruption costs. Costs reflecting the inconvenience experienced by network 
customers as a consequence of distribution disturbances. 


Power system stability. Stability performance of the distribution network. 


Microgrids, DSO’s effort to implement controlled islanded operation. Level of 
research, development and demonstration activity. 


KPI (1) is commonly used as a reliability indicator. It is the average outage duration for each 
customer served, in a unit of time, hours / year. Because the operation environment 
between different DSOs is variable, the performance in city area, urban area and in rural 
area networks is measured. 

KPI (2) measures also networks reliability. It is the average number of interruptions that a 
customer experience. Where the unit is a number of interruptions per customer / year. 
DSO’s performance in reliability is evaluated by measuring the interruption frequency on the 
distribution network. 

KPI (3) is related to the previous two. It can be calculated as a ratio providing the average 
outage duration that customer can experience, hours/ year. 

KPI (4) measures the total number of outages less than 3 minutes in duration per total 
number of customers. Unit is interruptions (< 3min) per customer / year. Since the operation 
environment between different DSOs is quite variable, the performance in city area, urban 
area and in rural area networks is measured. 

KPI (5) measures the development of large-scale cabling concerning medium voltage 
distribution networks when creating weatherproof network system which is able to tolerate 
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natural phenomena like storms and thunders. This KPI is not comparable with all DSOs 
because operational environment varies between city and rural area networks. 

e KPI (6) measures the share of high impedance grounded medium voltage networks in 
comparison with the whole medium voltage distribution network in the DSO's territory. 
Using high impedance grounded networks, it is possible to enhance the network reliability 
when considering earth faults, because a compensated network limits the current in earth 
fault situation and can extinguish itself with a higher probability than an unearthed network. 

e  KPI(7) measures the average distribution reliability in form of interruption costs. The impact 
of interruptions in electricity supply towards network users can be evaluated. Interruptions 
are causing expenses also towards network companies in form of fault repair costs. 

e KPI (8) measures the stability of the distribution grid operation. Power system stability 
should be at high level, even in when the share of intermittent RES production increases in 
the HV and LV distribution networks. This KPI is evaluierte the average network stability 
performance. 

e  KPI(9) measures the contribution of the DSO to implement active microgrid operation in the 
network. This enables to operate as controlled island in order to increase reliability. 

2. Environmental: KPIs of environmental impact, such as CO2 emissions reduction [200]. The KPIs in 
this domain are essential for understanding and evaluating the environmental impact of 
energy/storage and smart grid distribution related solutions. They are important for a smart 
system planning and operation. The environmental KPIs can be used to evaluate the efficiency of 
the energy systems demonstrated in environmental terms, according to the phase when the 
measurement is taken. 

3. Economic: KPIs measuring Economic Performance, such as the average cost of energy 
consumption, the average estimation of cost savings, etc. The economic performance evaluation 
takes into account the business efficiency of each application and usage scenario from the market 
stakeholder perspective (defining business oriented KPIs to evaluate the day-today performance 
of the tools and applications under evaluation). The economic indicator also considers the capital 
cost, maintenance cost, generation cost, and replacement cost. For example, the residents of 
apartments would like to have a view of the economic benefit from their flexible consumption 
behavior to sacrifice part of their comfort to achieve lower energy bills. Similarly, the business 
stakeholder (demand-response aggregator) may like to know the actual benefit from the 
implementation of DR strategies in a portfolio of customers. 

4. Social: KPls of Social impact such as the degree of users’ satisfaction from DR services. The selected 
indicators reveal that attitudes towards energy are interrelated with demand response 
mechanisms, and such KPIs can be used to evaluate the extent up to which the end users are willing 
to participate and be self-motivated for further demonstration and application of the 
demonstrated solutions. In general, the social domain visualizes the impact of a technology, 
scheme or policy to social factors like local wealth, unemployment, satisfaction. A popular 
approach that is used in literature for expressing the social KPls is the Likert scale, since it is a 
sensible way to quantify a qualitative value. 

5. Legal: KPIs of Legal infrastructure, such as the level of support for electricity/heat integration in 
the legal framework. KPIs in the legal domain monitor the legislative framework concerning the 
application and evolution of the proposed technological solutions. Thus, this specific domain 
allows for assessing the existing legal and regulatory framework and identifying the modifications 
that are needed for the deployment of the technology. 

6. ICT: The effective and reliable communication infrastructure based on two-way data transfer can 
be considered as an important building block of the smart energy system. The communication 
channels enable the information exchange between different parts of the network, especially in 
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network monitoring and controlling processes. The performance of the communication 
infrastructure must be reliable, secure, resilient and effective, especially when the amount of data 
and information increases. Specific KPIs for ICT infrastructure technology are given in Table 21. 


Table 21: KPIs related to ICT systems. 


No Key Performance Indicators 


1 Performance of communication channels towards the different grid elements 
(availability, bandwidth, response time). Level of performance 


2 Communication standards and protocols, compliance with European and 
international methods. Level of compliance 


3 Real-time data information exploitation to support the DSO’s internal processes. 
Level of performance 


4 Integration level between different IT-applications related to network control and 
management. Level of integration. 


5 Integration level between different IT-applications related to network control and 
management. Level of integration. 


6 | Two-way communication. Enabled alerts, remote control and layouts, reading 
logs, coupling status remote monitoring. Performance Level 


7 Customer information security / quality of the information. Level of information 
security and reliability 


KPI (1) describes the capacity of communication infrastructure. It is important that the 
availability of the communication channel is continuous, and the performance of the 
communication infrastructure is at high level. This means that the bandwidth is high 
enough to transfer the increasing amount of data in order to achieve an active network 
management and monitoring in real time. 

KPI (2) measures the compliance of the communication standards and protocols with 
European and national standards. Thus, it is vital to consider a communication system 
that uses common communication protocols and fits the national and international 
standards in order to create a uniform infrastructure. Standard communication system is 
an important building block for smart energy system and this KPI measures its compliance 
at national and international level. 

KPI (3) measures how well the communication infrastructure is supporting different 
operations of the network management. In order to achieve a flexible and efficient 
management of the network, it is crucial to exploit the real-time data in the network 
operation. Communication infrastructure should be able to offer network management 
systems like SCADA the needed information. 

KPI (4) measures how well the DSO exploits the real-time information of the network state 
and operation to support the versatile amount of internal processes of the company. Real- 
time information from the meters can be used in LV level network state calculation and 
fault management processes. 

KPI (5) measures the level integration between different network control systems. 
Supervisory control system and other systems related to network monitoring and 
management. 
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e KPI (6) measures the ability for two-way communication that enables many of the 
important functionalities of smart devices and network control processes. It is important 
to measure the consumption in two ways in future when the amount of distributed 
generation increases in the network. LV network automation is also an important matter 
and it needs to be supported by two-way communication. 

e KPI (7) measures the security of individual customer related data, third party access and 
other risks related to data management and utilization. Privacy policy sets many limits to 
customer related data availability and companies must use protected transmission 
methods in order to protect network user's privacy. This KPI measures data privacy and 
security issues and estimates the performance of the companies in order to achieve 
secure and protected communication between the stakeholders involved. The data 
protection involves authenticated and authorized data access, insuring data integrity and 
data confidentiality and should be compatible with security standards defined in IEC 
62351. Furthermore, the security solution selected should enable, when required to 
implement end to end data protection at the applicative level from source to destination 
with a single set of credentials, opening the possibility for transmitted data to transit via 
platforms not necessarily trusted. The quality of information must also be sufficient. 


Additionally, with respect to availability, this can be measured as the fraction of time that network 
connectivity is available between an ingress point and a specified egress point and defines network 
availability. It directly influences service availability that defined as the fraction of time that service is 
available between a specified ingress point and a specified egress point within bounds of a defined 
network availability. For the overall smart grid communications system reliability analysis there is a 
need of node reliability and availability definitions in the time and spatial domain. The node reliability 
can be defined based on two important metrics, namely Mean Time Between Failure (MTBF) as the 
average time between node failures, and Mean Time To Repair (MTTR) as the average time needed for 
the node in outage to be repaired and become operational. The availability is a degree to which the 
system, element or component is operational and accessible when required to be used and is defined 
as: 


Availability [%] = MTBF / (MTBF + MTTR), 


MTBF is statistically established metric, on the field with large population of elements over a longer 
period. MTTR is statistically measured metric on the field and it is the repair time until the 
reestablishment of the normal operation of node/element. 


4. Recommendations 


According to the information provided in the previous sections, in this section we summarize what we 
consider the main recommendations for the future elicitation of the functional and non-functional 
requirements in the SDN-microSENSE project. 


4.1 Asset Management 


The asset management system should use an intelligent approach in order to protect new and existing 
assets by future-proofing both large-scale grids and microgrids. This intelligent grid management 
system should decide an optimal maintenance strategy and decide on an optimum power flow control 
of the grid based on condition monitoring and diagnostic results of the loT devices [201]. In this way, 
a maximum life expectancy of aged infrastructure (i.e. transformers and circuit breakers) will be 
achieved and suitable power flow routes and maintenance strategies shall be derived. We recommend 
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the evaluation of cybersecurity solutions on asset management to be based on the KPIs in Table 2 and 
on the suggested tools of subsection 3.1.3 Key performance indicators04. 


4.2 Business Environment 


We recommend the evaluation of cybersecurity solutions on Business environment to be based onthe 
KPIs described in Table 3, and on the PESTLE tool, as described in subsection 0. This tool can be used 
as the framework to analyze and monitor the macro-environmental factors that crucially impact an 
organization. 


4.3 Governance and Risk Management 


As described before, Governance and Risk management systems automate the work associated with 
the documentation and reporting of the compliance activities inside an organization. Compliance 
management functionalities, like financial reporting compliance, policy management and industry- 
specific regulations and standards are typically supported by these systems. For the SDN-microSENSE 
project, our recommendation is to base the Governance & Risk Management on the KPIs in Table 4, 
and also, on the suggested tools in subsection O. 


4.4 Risk Assessment 


By determining the business risk exposure, an organization is capable of identifying the functions that 
are prone to the greatest risk, thus facilitating its risk assessment focus on the most highly exposed 
areas. We consider the Risk Assessment system must be based on the relevant security standards 
related to the evaluation and identification of potential vulnerabilities of the organizational assets (i.e. 
hardware, software, data, and personnel), as well as an organization's procedures, processes, and 
information transfers associated with a specific IT system. The evaluation of cybersecurity solutions on 
Risk Assessment should be based on the KPIs in Table 5, and also, on the suggested tools in subsection 
0. 


4.5 Risk Management Strategy 


An effective risk management Strategy approach should be able to recognize existing and potential 
risks and implement appropriate measures to mitigate and manage these risks. It should also estimate 
the probability of occurrence of a risk and evaluate what operations might be impacted by the 
occurrence of a specific risk event. In this case, the recommendation is to base the evaluation of the 
cybersecurity solutions on Risk Management Strategy on the KPIs described in Table 6. 


4.6 Supply Chain Risk Management 


Supply chain risk extends to all the business and the operational environment of an organization. 
Technology is in this case used across all functions of an enterprise, making it vulnerable to threats 
such as cyber-terrorism, malware and data theft. Thus, cyber-security in the supply chain is deemed as 
a required risk-avoidance strategy for large and small-scale organizations. A Supply Chain Risk 
Management strategy should consider the mitigation of a supply chain failure events (i.e. utilization of 
multiple suppliers to mitigate supplier failures) and enforce preventive measures to reduce the 
probability of occurrence of a threat. In our case, the recommendation for the evaluation of the 
cybersecurity solutions on Supply Chain Risk Management is to be based on the KPIs in Table 7, and 
also, on the tools described in subsection O. 
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4.7 Identity and Control Management 


The identity and control management system should use a decentralized approach by separating the 
large grid into networked micro grids. This can be achieved by utilizing the Block Chain Technology and 
Direct Acyclic graphs. Also, it should unify functions across the OT and IT networks and encompass a 
workflow capability that can change an existing user's access to the different networks and systems as 
well as to assign users access privileges and requirements based on sets of configurable business rules. 
The evaluation of cybersecurity solutions should be based on the KPIs in Table 8, and also, on some of 
the tools described in subsection 0; among them, we specially recommend Microsoft Azure, IBM 
Security Identity, Access Assurance and the RSA SecurlD Suite. 


4.8 Awareness and Training 


We recommend involved organizations to actively train users and employees in the appropriate 
security practices, such as password usage and management, use of anti-virus and anti-malware tools, 
effective patch management, and also, how to handle emails/attachments from unknown senders and 
SPAM. They must also create comprehensive training programmes for SW developers and system 
administrators. All this can be done by using virtual classrooms, instructor-led sessions, IT security days 
and periodic newsletters. Organizations should also alert and advise users and employees about 
possible threats using regular and quick communications means. They should also implement 
continuity plans on this and provide specific training and awareness for disaster recovery situations. 
We consider the evaluation of cybersecurity solutions on awareness and training should be based on 
the application of the KPls in Table 9, and also, on the solutions presented in subsection O. 


4.9 Data Security 


The data security systems should keep the data flow secure and continuous by utilizing fundamental 
requirements such as confidentiality availability and integrity. Also, they should protect data from 
being accessed by unauthorized users, and they should guarantee that data are timely accessible, 
ensuring accuracy and trustworthiness. We recommend that, in the context of the SDN-microSENSE 
project, the data security to be based on the following means: 


e Usage of the DoS-Resistant Broadcast Authentication Protocol, 

e Usage of a Fuzzy Cognitive Model, or 

e Utilizing a Cyberattack defense mechanism where Hypothesis Testing, Composed Measurement 
Error and Largest Normalized Error Test could be combined to produce an optimal result. 

e Additionally, the encryption of the network could be performed through AES or Blowfish. 


The evaluation of the Data Security systems should be based on the KPIs in Table 10. Also, the tools 
discussed in subsection O. should be used for the evaluation. Among these tools we consider the 
Kaspersky Endpoint Security, the IBM Security Guardium, and the Check Point Data Loss Prevention as 
the most relevant. 


4.10 Information Protection Processes and Procedures 


Information protection process and procedure must be achieved by utilizing any of the protocols 
suitable for the system, as presented in Table 1 (e.g. NISTIR 7628, NERC CIP). Also, organizations should 
implement a System Development Life Cycle to manage systems [1]. On the other hand, organizations 
must assure that regular backups are conducted, maintained, and tested, and also, that response and 
recovery plans are tested, data are destroyed according to a specific policy and that the effectiveness 
of protection processes is shared. Organizations should also develop and implement a vulnerability 
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management plan. The evaluation of cybersecurity solutions on Information Protection Processes and 
Procedures should be done by the KPIs in Table 11, and also, they should be computed by running use 
cases with normal case study and secured case study. 


4.11 Maintenance 


The maintenance of the Smart Grid architecture should be achieved by calculating the reliability of the 
protection systems in order to find the optimal maintenance plan. This can be done in multiple ways, 
e.g.: 


e by using Markov processes, 

e utilizing an on-line Reinforcement Learning algorithm and Artificial Neural Networks, 

e utilizing a multi-objective maintenance system based on a global criterion approach considering 
the relevant constraints of the load balance and the maintenance time intervals of units 


The organization should perform and log the maintenance and repairs of organizational assets, with 
approved and controlled tools. It should also perform maintenance operations at predetermined 
authorized times, or on an approved as-needed basis. The organization should also develop and sustain 
maintenance policies and procedures to facilitate the implementation of the information system 
security maintenance requirements and the associated system information system security 
maintenance controls. They should develop and sustain maintenance policies and procedures to 
facilitate the implementation of the information system security maintenance requirements. The 
evaluation of cybersecurity solutions on Maintenance should be made by using the KPIs in Table 12. 


4.12 Protective Technology 


The protective technology system should be implemented by a holistic attack resilient framework to 
protect the integrated Distributed Energy Resource (DER) and the critical power grid infrastructure. 
This should rely on a monitoring system that could be based in Cyber Graph (a tool to assess the impact 
of cyberattack) in order to achieve the grid to have the capability to monitor and analyze changing 
conditions. Also, the protective technology system might be based on the so-called E-LAN, which are 
energy networks with high degree of flexibility, reliability, robustness, and readiness. The evaluation 
of cyber security solutions on protective technology should be made by applying the KPIs in subsection 
0. 


4.13 Intrusion Detection and Prevention Processes 


Contemporary intrusion detection mechanisms related to EPES should be capable of monitoring the 
overall infrastructure as well as each asset, including both electrical-related and IT-related assets. The 
SIEM systems constitute the state-of-the-art solution for monitoring an environment as well as 
aggregating and normalizing the collected information. A complete IDS should combine various 
intrusion detection techniques (signature-based, anomaly-based and specification-based), thus 
detecting and preventing a plethora of cyberattacks, anomalies and zero-day attacks. In this system, 
the signatures of a signature-based IDS should be updated continuously in order to detect new 
invasions. Also, the specification of specification-based IDS should be updated based on the changes 
made in each environment. 


The intrusion detection mechanisms devoted to protecting critical infrastructures, such as EPES, have 
to beresilient against cyberattacks aiming to bypass them, such as code packing and encryption, packet 
fragmentation, obfuscation, code mutation and DoS attacks. So, the IDS systems for EPES should offer 
appropriate prevention capabilities, thus ensuring the normal operation of the system, such as for 
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example, the isolation of the malicious network flows. Also, modern IDS systems should be able to 
recognize and mitigate attacks against the industrial application-layer protocols (such as Modbus, 
DNP3, IEC 61850 -GOOSE and MMS- or IEC 60870-5-104 among others). 


We consider that SDN is a promising technology that can significantly contribute to the intrusion 
detection and prevention processes. In particular, SDN can be used for isolating those network flows 
considered as malicious. However, although SDN constitutes an emerging technology that can be 
exploited by the cybersecurity-related application, at the same time, it brings also certain security 
issues, basically because it is typically based in a centralized point (the SDN controller) that might take 
decisions about the overall network or specific sensitive devices. For this reason, the SDN controller 
must be protected appropriately against a variety of cyberattacks also. 


4.15 Anomaly Detection 


Regarding anomaly detection, we recommend a mapping of the physical processes and data 
exchanges, as well as of the anomalies and events appearing in the grid-based use-cases. The 
development of indicative scenarios of intrusions can be followed to prepare detection examples for 
the substation attacks. These will include attacks on the physical devices (i.e. relays, servers, PMUs, 
BCUs...) or the data processing parameters, i.e. in the load forecasting process. 


4.16 Incidents Response 


We consider the development of a response plan should follow the formulation presented in the report 
of CEER applicable for European energy utilities. Based on this, communications during and after the 
events should be clear and focus on problem resolution following an action plan that could be devised. 
Mitigation activities could include the development and application of honeypots in selected smart 
grid points which are considered ‘popular’ among attackers and introduce specific algorithms for the 
identification of abnormal data on data streams. Also, lessons learned should trigger continuous 
improvements in the response approaches. During the identification of use-case details, the KPls in 
Table 15 can be used for evaluating the performance of the response practices, and they can be refined 
to express the performance against specific timeframes and services/applications. Finally, an issue 
tracking system could help to ensure the analysis of the cyber-attack and resolved in a timely manner. 


5. Conclusions 

The main objective of the WP2 is to describe in detail the functional and technical specifications of the 
SDN-microSENSE architecture. In this context, this deliverable provides the initial overview of state-of- 
the-art cybersecurity solutions and technologies in EPES, which will support the architecture definition. 
Starting with an initial section where a brief background on the relevant concepts is provided (critical 
energy/electric systems, microgrids and relevant state-of-the-art standards) the different state of the 
art solutions has been later categorized across the five cybersecurity functions described by NIST 
(identify, protect, detect, respond and recover). As a whole, each of these functions has been treated 
and evaluated in accordance with the following steps: 


1. Provide the definition of the function by NIST. 

2. Provide the categories of cybersecurity solutions identified by NIST for each function. 

3. Provide a discussion over the theoretical background specific to the function, relevant to EPES. 

4. Provide Key Performance Indicators for the evaluation of cybersecurity solutions within the 
categories of each function 
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5. Discuss state of the art solutions relevant to the function, with primary focus on research results, 


but also a brief presentation of currently available products. 


After this, a complete series of recommendations (also aligned with the NIST concepts) and tools 
references are granted for further use in subsequent stages of the project, in order to ease the 
elaboration of requirements, specifications and the architecture design. We consider that this 
deliverable describes a wide set of options and tools that can be of use for those later stages. 
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